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The present invention 
(Fig. 6) relates to a voice 
browser to provide interactive 
services. A markup language 
document in accordance with 
the present invention includes 
a dialog element (2) including 
a plurality of markup language 
elements (3-26). Each of the 
plurality of markup language 
elements is identifiable by at 

least one markup tag. A step element (11) is contained within the dialog element. The step element includes a prompt element (4) and an 
input element (9). The prompt element (4) includes an announcement to be read to the user. The input element includes at least one input 
that corresponds to a user input. A method in accordance with the present invention includes the steps of creating a markup language 
document having a plurality of elements (3-26), selecting a prompt element (2), and defining a voice communication (14) in the prompt 
element to be read to the user. The method further includes the steps of selecting an input element (2) and defining an input variable to 
store data inputted by the user. 
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MARKUP LANGUAGE FOR INTERACTIVE SERVICES AND 
METHODS THEREOF 



5 Notice of Copyright 

A portion of the disclosure of this patent document 
contains material which is subject to copyright 
protection. The copyright owner has no objection to the 
10 facsimile reproduction by anyone of the patent document 
or the patent disclosure, as it appears in the Patent 
and Trademark Office patent files or records, but 
otherwise reserves all copyright rights and similar 
rights whatsoever. 
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Field of the Invention 



The present invention generally relates to 
information retrieval, and more particularity, to 
20 methods and systems to allow a user to access 
information from an information source. 

Background of the Invention 

25 On-line electronic information services are 

being increasingly utilized by individuals having 
personal computers to retrieve various types of 
information. Typically, a user having a personal 
computer equipped with a modem dials into a service 

30 provider, such as an Internet gateway, an on-line 
service (such an America On-line, CompuServer, or 
Prodigy), or an electronic bulletin board to download 
data representative of the information desired by the 
user . 
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The information from the service provider is 
typically downloaded in real-time (i.e., the information 
is downloaded contemporaneously with a request for the 
information) . Examples of information downloaded in 
5 this manner include electronic versions of newspapers, 
books (i.e., an encyclopedia), articles, financial 
information, etc. The information can include both text 
and graphical in any of these examples. 

10 Brief Description of the Drawings 

The invention is pointed out with particularity in 
the appended claims. However, other features of the 
invention will become more apparent and the invention 
15 will be best understood by referring to the following 
detailed description in conjunction with the 
accompanying drawings in which: 

FIG. 1 is a block diagram of an embodiment of a 
system in accordance with the present invention; 
20 FIG. 2 is a flow diagram of a method of retrieving 

information from an information source; 

FIG. 3 is an exemplary block diagram of another 
embodiment of a system in accordance with the present 
invention; 

25 FIG. 4 is a block diagram of a voice browser of the 

system of FIG. 3; 

FIGS. 5a-5c are flow diagrams of a routine carried 
out by the voice browser of FIG. 4; 

FIG. 6 is an exemplary markup language document; 
30 FIG. 7 is a diagrammatic illustration of a 

hierarchical structure of the markup language document 
of FIG. 6; 

FIG. 8 is an exemplary state diagram of a markup 
language document; and 
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FIG. 9 is another an exemplary state diagram of an 
exemplary application of a markup language document. 

5 Detailed Description of the Preferred Embodiments 

Before explaining the present embodiments in 
detail , it should be understood that the invention is 
not limited in its application or use to the details of 

10 construction and arrangement of parts illustrated in the 
accompanying drawings and description. It will be 
recognized that the illustrative embodiments of the 
invention may be implemented or incorporated in other 
embodiments, variations and modifications, and may be 

15 practiced or carried out in various ways. Furthermore, 
unless otherwise indicated, the terms and expressions 
employed herein have been chosen for the purpose of 
describing the illustrative embodiments of the present 
invention for the convenience of the reader and are not 

20 for the purpose of limitation. 

Referring now to the drawings, and more 
particularly to FIG. 1, a block diagram of a system 100 
is illustrated to enable a user to access information. 
The system 100 generally includes one or more network 

25 access apparatus 102 (one being shown), an electronic 
network 104, and one or more information sources or 
content providers 106 (one being shown). 

The electronic network 104 is connected to the 
network access apparatus 102 via a line 108, and the 

30 electronic network 102 is connected to the information 
source 106 via a line 110. The lines 108 and 110 can 
include, but are not limited to, a telephone line or 
link, an ISDN line, a coaxail line, a cable television 
line, a fiber optic line, a computer network line, a 
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digital subscriber line, or the like. Alternatively, 
the network access apparatus 10 2 and the information 
source 106 can wirelessly communicate with the 
electronic network. For example, the electronic network 
5 104 can provide information to the network access 

apparatus 102 by a satellite communication system, a 
wireline communication system, or a wireless 
communication system. 

The system 100 enables users to access information 

10 from any location in the world via any suitable network 
access device. The users can include, but are not 
limited to, cellular subscribers, wireline subscribers, 
paging subscribers, satellite subscribers, mobile or 
portable phone subscribers, trunked radio subscribers, 

15 computer network subscribers (i.e., internet 

subscribers, intranet subscribers, etc.), branch office 
users, and the like. 

The users can preferably access information from 
the information source 106 using voice inputs or 

20 commands. For example, the users can access up-to-date 
information, such as, news updates, designated city 
weather, traffic conditions, stock quotes, calendar 
information, user information, address information, and 
stock market indicators. The system also allows the 

25 users to perform various transactions (i.e., order 

flowers, place orders from restaurants, place buy and 
sell stock orders, obtain bank account balances, obtain 
telephone numbers, receive directions to various 
destinations , etc . ) . 

30 As shown in FIG. 1, a user utilizes the network 

access apparatus 102 of the system 100 to communicate 
and/or connect with the electronic network 104. The 
electronic network 104 retrieves information from the 
information source 106 based upon speech commands or 
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DTMF tones from the user. The information is preferably 
stored in a database or storage device (not shown) of 
the information source 106. The information source 106 
can include one or more server computers (not shown). 
5 The information source can be integrated into the 
electronic network 104 or can be remote from the 
electronic network (i.e., at a content providers 
facilities). It will also be recognized that the 
network access apparatus 102, the electronic network 

10 104, and the information source 106 can be integrated in 
a single system or device. 

The information of the information source 106 can 
be accessed over any suitable communication medium. The 
information source 106 can be identified by an 

15 electronic address using at least a portion of a URL 
(Uniform Resource Locator), a URN (Uniform Resource 
Name), an IP (Internet Protocol) address, an electronic 
mail address, a device address (i.e. a pager number), a 
direct point to point connection, a memory address, etc. 

20 It is noted that a URL can include: a protocol, a domain 
name, a path, and a filename. URL protocols include: 
"file:" for accessing a file stored on a local storage 
medium; "ftp:" for accessing a file from an FTP (file 
transfer protocol) server; "http:" for accessing an HTML 

25 (hypertext marking language) document; "gopher:" for 
accessing a Gopher server; "mailto: " for sending an e- 
mail message; "news:" for linking to a Usenet newsgroup; 
"telnet:" for opening a telnet session; and "wais:" for 
accessing a WAIS server. 

30 Once the electronic network 104 of the system 100 

receives the information from the information source 
106, the electronic network sends the information to the 
network access apparatus 102. The electronic network 104 
can include an open, wide area network such as the 
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Internet, the World Wide Web (WWW), and/or an on-line 
service. The electronic network 104 can also include, 
but is not limited to, an intranet, an extranet, a local 
area network, a telephone network, (i.e., a public 
5 switched telephone network), a cellular telephone 

network, a personal communication system (PCS) network, 
a television network (i.e., a cable television system), 
a paging network (i.e., a local paging network), a 
regional paging network, a national or a global paging 

10 network, an email system, a wireless data network (i.e., 
a satellite data network or a local wireless data 
network), and/or a telecommunication node. 

The network access apparatus 102 of the system 100 
allows the user to access (i.e., view and/or hear) the 

15 information retrieved from the information source. The 
network access apparatus can provided the information to 
the user as machine readable data, human readable data, 
audio or speech communications, textual information, 
graphical or image data, etc. The network access 

20 apparatus can have a variety of forms, including but not 
limited to, a telephone, a mobile phone, an office 
phone, a home phone, a pay phone, a paging unit, a radio 
unit, a web phone, a personal information manager (PIM), 
a personal digital assistant (PDA), a general purpose 

25 computer, a network television, an Internet television, 
an Internet telephone, a portable wireless device, a 
workstation, or any other suitable communication device. 
It is contemplated that the network access device can be 
integrated with the electronic network. For example, 

30 the network access device, the electronic network, 

and/or the information source can reside in a personal 
computer . 

The network access apparatus 102 may also include a 
voice or web browser, such as, a Netscape Navigator® web 
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browser, a Microsoft Internet Explorer® web browser, a 
Mosaic® web browser, etc. it is also contemplated that 
the network access apparatus 102 can include an optical 
scanner or bar code reader to read machine readable 
5 data, magnetic data, optical data, or the like, and 
transmit the data to the electronic network 104. For 
example, the network access apparatus could read or scan 
a bar code and then provide the scanned data to the 
electronic network 104 to access the information from 
10 the information source (i.e., a menu of a restaurant, 
banking information, a web page, weather information, 
etc . ) . 

FIG. 2 illustrates a flow diagram of a method of 
retrieving information from a destination or database of 

15 the information source 106. At block 150, a user calls 
into the electronic network 104 from a network access 
apparatus. After the electronic network answers the 
incoming calls at block 152, the electronic network can 
attempt to verify that the user is a subscriber of the 

20 system and/or the type of network access apparatus the 
user is calling from. For example, the system may read 
and decode the automatic number identification (ANI) or 
caller line identification (CLI) of the call and then 
determine whether the CLI of the call is found in a 

25 stored ANI or CLI list of subscribers. The system may 
also identify the user by detecting a unique speech 
pattern from the user (i.e., speaker verification) or a 
PIN entered using voice commands or DTMF tones. 

After the electronic network answers the call, the 

30 electronic network provides a prompt or announcement to 
the caller at block 154 (i.e., "Hi. This is your 
personal agent. How may I help you"). The electronic 
network can also set grammars (i.e., vocabulary) and 
personalities (i.e., male or female voices) for the 
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call. The electronic network can load the grammars and 
personalities based upon the CLI, the network access 
apparatus , or the identity of the user. For example, 
the grammars and personalities can be set or loaded 
5 depending upon the type of device (i.e., a wireless 

phone), the gender of the caller (i.e., male or female), 
the type of language (i.e., English, Spanish, etc.), and 
the accent of the caller (i.e., a New York accent, a 
southern accent, an English accent, etc.). It is also 

10 contemplated that the personalities and grammars may be 
changed by the user or changed by the electronic network 
based upon the speech communications detected by the 
electronic network. 

At block 156, the electronic network waits for an 

15 input or command from the user that corresponds to a 
destination of the information source desired by the 
user. The input can be audio commands (i.e., speech) or 
DTMF tones. After the electronic network receives the 
input from the user, the electronic network establishes 

20 a connection or a link to the information source at 

block 158. The electronic network preferably determines 
an electronic address of the information source (i.e., 
URL, a URN, an IP address, or an electronic mail 
address) based upon the inputs from the user (i.e., 

25 speech or DTMF tones). The electronic address can be 
retrieved from a database using a look-up operation 
based upon at least a portion of the input. 

At block 160, the electronic network retrieves at 
least a portion of the information from the destination 

30 of the information source at block 160. The electronic 
network processes the information and then provides an 
output to the user based upon the retrieved information 
at block 162. The output can include a speech 
communication, textual information, and/or graphical 
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information. For example, the electronic network can 
provide a speech communication using speech-to-text 
technology or human recorded speech. The process then 
proceeds to block 164 or block 154 as described above. 
5 It will be recognized that the above described method 
can be carried out by a computer. 

Referring now to FIG. 3, an exemplary block diagram 
of an embodiment of a system 2 00 to enable a user to 
access information is shown. The system 2 00 enables a 

10 user to access information from any location in the 

world via a suitable communication device. The system 
200 can provide access to yellow pages , directions, 
traffic, addresses, movies, concerts, airline 
information, weather information, new reports, financial 

15 information, flowers, personal data, calendar data, 
address data, gifts, books, etc. The user can also 
perform a series of transactions without having to 
terminate the original call to the system. For example, 
the user can access a news update and obtain weather 

20 information, all without having to dial additional 

numbers or terminate the original call. The system 200 
also enables application developers to build 
applications for interactive speech applications using a 
markup language, such as VoxML™ voice markup language 

25 developed by Motorola, Inc. 

The system 2 00 generally includes one or more 
communication devices or network access apparatus 201, 
202, 203 and 204 (four being shown), an electronic 
network 206, and one or more information sources, such 

30 as content providers 208 and 209 (two being shown) and 
markup language servers. The user can retrieve the 
information from the information sources using speech 
commands or DTMF tones. 
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The user can access the electronic network 206 by 
dialing a single direct access telephone number (i.e., a 
foreign exchange number, a local number, or a toll-free 
number or PBX) from the communication device 202. The 
5 user can also access the electronic network 206 from the 
communication device 2 04 via the internet, from the 
communication device 2 03 via a paging network 211, and 
from the communication device 201 via a local area 
network (LAN), a wide area network (WAN), or an email 

10 connection. 

The communication devices can include, but are not 
limited to, landline or wireline devices (i.e., home 
phones, work phones, computers, facsimile machines, pay 
phones), wireless devices (i.e., mobile phones, trunked 

15 radios, handheld devices, PIMs, PDAs, etc.), network 
access devices (i.e. computers), pagers, etc. The 
communication devices can include a microphone, a 
speaker, and/or a display. 

As shown in FIG. 3, the electronic network 2 06 of 

20 the system 200 includes a telecommunication network 210 
and a communication node 212. The telecommunication 
network 210 is preferably connected to the communication 
node 212 via a high-speed data link, such as, a Tl 
telephone line, a local area network (LAN), or a wide 

25 area network (WAN). The telecommunication network 210 

preferably includes a public switched network (PSTN) 214 
and a carrier network 216. The telecommunication network 
210 can also include international or local exchange 
networks, cable television network, interexchange 

30 carrier networks (IXC) or long distance carrier 

networks, cellular networks (i.e., mobile switching 
centers (MSG)), PBXs, satellite systems, and other 
switching centers such as conventional or trunked radio 
systems (not shown), etc. 



WO 00/05643 .-i 1 _ PCT/US99/16777 

The PSTN 214 of the telecommunication network 210 
can include various types of communication equipment or 
apparatus, such as ATM networks, Fiber Distributed data 
networks (FDDI), Tl lines, cable television networks and 
5 the like. The carrier network 216 of the 

telecommunication network 210 generally includes a 
telephone switching system or central office 218. It 
will be recognized that the carrier network 216 can be 
any suitable system that can route calls to the 

10 communication node 212, and the telephone switching 
system 218 can be any suitable wireline or wireless 
switching system. 

The communication node 212 the system 200 is 
preferably configured to receive and process incoming 

15 calls from the carrier network 216 and the internet 220, 
such as the WWW. The communication node can receive and 
process pages from the paging network 211 and can also 
receive and process messages (i.e., emails) from the 
LAN, WAN or email connection 213. 

20 When a user dials into the electronic network 206 

from the communication device 202, the carrier network 
216 routes the incoming call from the PSTN 214 to the 
communication node 212 over one or more telephone lines 
or trunks. The incoming calls preferably enters the 

25 carrier network 216 through one or more "888" or M 800" 
INWATS trunk lines, local exchange trunk lines, or long 
distance trunk lines. It is also contemplated that the 
incoming calls can be received from a cable network, a 
cellular system, or any other suitable system. 

30 The communication node 212 answers the incoming 

call from the carrier network 216 and retrieves an 
appropriate announcement (i.e., a welcome greeting) from 
a database, server, or browser. The node 212 then plays 
the announcement to the caller. In response to audio 
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inputs from the user, the communication node 212 
retrieves information from a destination or database of 
one or more of the information sources, such as the 
content providers 20 8 and 2 09 or the markup language 
5 servers. After the communication node 212 receives the 
information, the communication node provides a response 
to the user based upon the retrieved information. 

The node 212 can provide various dialog voice 
personalities (i.e., a female voice, a male voice, etc.) 

10 and can implement various grammars (i.e., vocabulary) to 
detect and respond to the audio inputs from the user. 
In addition, the communication node can automatically 
select various speech recognition models (i.e., an 
English model, a Spanish model, an English accent model, 

15 etc.) based upon a user profile, the user's 

communication device, and/or the user's speech patterns. 
The communication node 212 can also allow the user to 
select a particular speech recognition model. 

When a user accesses the electronic network 2 06 

20 from a communication device registered with the system 
(i.e., a user's home phone, work phone, cellular phone, 
etc.), the communication node 212 can by-pass a user 
screening option and automatically identify the user (or 
the type of the user's communication device) through the 

25 use of automatic number identification (ANI) or caller 

line identification (CLI). After the communication node 
verifies the call, the node provides a greeting to the 
user (i.e., "Hi, this is your personal agent, Maya. 
Welcome Bob. How may I help you?"). The communication 

30 node then enters into a dialogue with the user, and the 
user can select a variety of information offered by the 
communication node. 

When the user accesses the electronic network 2 06 
from a communication device not registered with the 
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system (i.e., a payphone, a phone of a non-subscriber, 
etc.), the node answers the call and prompts the user to 
enter his or her name and/or a personal identification 
number (PIN) using speech commands or DTMF tones. The 
5 node can also utilize speaker verification to identify a 
particular speech pattern of the user. If the node 
authorizes the user to access the system, the node 
provides a personal greeting to the user (i.e., "Hi, 
this is your personal agent, Maya. Welcome Ann. How 

10 may I help you? n ). The node then enters into a dialogue 
with the user, and the user can select various 
information offered by the node. If the name and/or PIN 
of the user cannot be recognized or verified by the 
node, the user will be routed to a customer service 

15 representative . 

As shown in FIG . 3, the communication node 212 
preferably includes a telephone switch 230, a voice or 
audio recognition (VRU) client 232, a voice recognition 
(VRU) server 234, a controller or call control unit 236, 

20 an Operation and Maintenance Office (OAM) or a billing 
server unit 238, a local area network (LAN) 240, an 
application server unit 242, a database server unit 244, 
a gateway server or router firewall server 246, a voice 
over internet protocol (VOIP ) unit 24 8, a voice browser 

25 250, a markup language server 251, and a paging server 
252. Although the communication node 206 is shown as 
being constructed with various types of independent and 
separate units or devices, the communication node 212 
can be implemented by one or more integrated circuits, 

30 microprocessors, microcontrollers, or computers which 

may be programmed to execute the operations or functions 
equivalent to those performed by the device or units 
shown. It will also be recognized that the 
communication node 212 can be carried out in the form of 
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hardware components and circuit designs, software or 
computer programming, or a combination thereof. 

The communication node 212 can be located in 
various geographic locations throughout the world or the 
5 United States (i.e., Chicago, Illinois). The 

communication node 212 can be operated by one or more 
carriers (i.e., Sprint PCS, Qwest Communications, MCI, 
etc.) or independent service providers, such as, for 
example, Motorola, Inc. 

10 The communication node 212 can be co-located or 

integrated with the carrier network 216 (i.e., an 
integral part of the network) or can be located at a 
remote site from the carrier network 216. It is also 
contemplated that the communication node 212 may be 

15 integrated into a communication device, such as, a 

wireline or wireless phone, a radio device, a personal 
computer, a PDA, a PIM, etc. In this arrangement, the 
communication device can be programmed to connect or 
link directly into an information source. 

20 The communication node 212 can also be configured 

as a standalone system to allow users to dial directly 
into the communication node via a toll free number or a 
direct access number. In addition, the communication 
node 212 may comprise a telephony switch (i.e., a PBX or 

25 Centrix unit), an enterprise network, or a local area 
network. In this configuration, the system 200 can be 
implemented to automatically connect a user to the 
communication node 212 when the user picks a 
communication device, such as, the phone. 

30 When the telephone switch 230 of the communication 

node 212 receives an incoming call from the carrier 
network 216, the call control unit 236 sets up a 
connection in the switch 230 to the VRU client 232. The 
communication node 212 then enters into a dialog with 
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the user regarding various services and functions. The 
VRU client 232 preferably generates pre-recorded voice 
announcements and/or messages to prompt the user to 
provide inputs to the communication node using speech 
5 commands or DTMF tones. In response to the inputs from 
the user, the node 212 retrieves information from a 
destination of one of the information sources and 
provides outputs to the user based upon the information. 
The telephone switch 230 of the telecommunication 

10 node 212 is preferably connected to the VRU client 232, 
the VOIP unit 248, and the LAN 240. The telephone 
switch 230 receives incoming calls from the carrier 
switch 216. The telephone switch 230 also receives 
incoming calls from the communication device 2 04 routed 

15 over the internet 220 via the VOIP unit 248. The switch 
230 also receives messages and pages from the 
communication devices 201 and 203, respectively. The 
telephone switch 230 is preferably a digital cross- 
connect switch, Model No. LNX, available from Excel 

20 Switching Corporation, 255 Independence Drive, Hyannis, 
MA 02601. It will be recognized that the telephone 
switch 2 30 can be any suitable telephone switch. 

The VRU client 232 of the communication node 212 is 
preferably connected to the VRU server 2 34 and the LAN 

25 240. The VRU client 232 processes speech 

communications, DTMF tones, pages, and messages (i.e., 
emails) from the user. Upon receiving speech 
communications from the user, the VRU client 232 routes 
the speech communications to the VRU server 234. When 

30 the VRU client 232 detects DTMF tones, the VRU client 
232 sends a command to the call control unit 236. It 
will be recognized that the VRU client 232 can be 
integrated with the VRU server. 
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The VRU client 232 preferably comprises a computer, 
such as, a Windows NT compatible computer with hardware 
capable of connecting individual telephone lines 
directly to the switch 230. The VRU client preferably 
5 includes a microprocessor, random access memory, read- 
only memory, a Tl or ISDN interface board, and one or 
more voice communication processing board (not shown). 
The voice communication processing boards of the VRU 
client 232 are preferably Dialogic boards, Model No. 

10 Antares, available from Dialogic Corporation, 1515 Route 
10, Parsippany, N.J. 07054. The voice communication 
boards may include a voice recognition engine having a 
vocabulary for detecting a speech pattern (i.e., a key 
word or phrase). The voice recognition engine is 

15 preferably a RecServer software package, available from 
Nuance Communications, 1380 Willow Road, Menlo Park, 
California 94025. 

The VRU client 232 can also include an echo 
canceler (not shown) to reduce or cancel text-to-speech 

20 or playback echoes transmitted from the PSTN 214 due to 
hybrid impedance mismatches. The echo canceler is 
preferably included in an Antares Board Support Package, 
available from Dialogic. 

The call control unit 236 of the communication node 

25 212 is preferably connected to the LAN 240. The call 
control unit 236 sets up the telephone switch 230 to 
connect incoming calls to the VRU client 232. The call 
control unit also sets up incoming calls or pages into 
the node 212 over the internet 22 0 and pages and 

30 messages sent from the communication devices 201 and 203 
via the paging network 203 and email system 213. The 
control call unit 236 preferably comprises a computer, 
such as, a Window NT compatible computer. 
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The LAN 2 40 of the communication node 212 allows 
the various components and devices of the node 212 to 
communicate with each other via a twisted pair, a fiber 
optic cable, a coaxial cable, or the like. The LAN 240 
5 may use Ethernet, Token Ring, or other suitable types of 
protocols. The LAN 240 is preferably a 100 Megabit per 
second Ethernet switch, available from Cisco Systems, 
San Jose, California. It will be recognized that the 
LAN 240 can comprise any suitable network system, and 
10 the communication node 212 may include a plurality of 
LANs. 

The VRU server 2 34 of the communication node 212 is 
connected to the VRU client 232 and the LAN 240. The VRU 
server 2 34 receives speech communications from the user 

15 via the VRU client 232. The VRU server 234 processes 
the speech communications and compares the speech 
communications against a vocabulary or grammar stored in 
the database server unit 244 or a memory device. The 
VRU server 234 provides output signals, representing the 

20 result of the speech processing, to the LAN 240. The 
LAN 240 routes the output signal to the call control 
unit 236, the application server 242, and/or the voice 
browser 250. The communication node 212 then performs a 
specific function associated with the output signals. 

25 The VRU server 234 preferably includes a text-to- 

speech (TTS) unit 252, an automatic speech recognition 
(ASR) unit 254, and a speech-to-text (STT) unit 256. 
The TTS unit 252 of the VRU server 234 receives textual 
data or information (i.e., e-mail, web pages, documents, 

30 files, etc.) from the application server unit 242, the 

database server unit 244, the call control unit 236, the 
gateway server 246, the application server 242, and the 
voice browser 250. The TTS unit 252 processes the 



WO 00/05643 



-18- 



PCT/US99/16777 



textual data and converts the data to voice data or 
information. 

The TTS unit 252 can provide data to the VRU client 
232 which reads or plays the data to the user. For 
5 example, when the user requests information (i.e., news 
updates, stock information, traffic conditions, etc.), 
the communication node 212 retrieves the desired data 
(i.e., textual information) from a destination of the 
one or more of the information sources and converts the 

10 data via the TTS unit 252 into a response. 

The response is then sent to the VRU client 232. 
The VRU client processes the response and reads an audio 
message to the user based upon the response. It is 
contemplated that the VRU server 234 can read the audio 

15 message to the user using human recorded speech or 

synthesized speech. The TTS unit 252 is preferably a 
TTS 20 0 0 software package, available from Lernout and 
Hauspie Speech Product NV, 52 Third Avenue, Burlington, 
Mass. 01803. 

20 The ASR unit 254 of the VRU server 234 provides 

speaker independent automatic speech recognition of 
speech inputs or communications from the user. It is 
contemplated that the ASR unit 254 can include speaker 
dependent speech recognition. The ASR unit 254 

25 processes the speech inputs from the user to determine 
whether a word or a speech pattern matches any of the 
grammars or vocabulary stored in the database server 
unit 244 or downloaded from the voice browser. When the 
ASR unit 254 identifies a selected speech pattern of the 

30 speech inputs, the ASR unit 254 sends an output signal 
to implement the specific function associated with the 
recognized voice pattern. The ASR unit 254 is preferably 
a speaker independent speech recognition software 
package, Model No. RecServer, available from Nuance 
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Communications. It is contemplated that the ASR unit 
254 can be any suitable speech recognition unit to 
detect voice communications from a user. 

The STT unit 2 56 of the VRU server 234 receives 
5 speech inputs or communications from the user and 

converts the speech inputs to textual information (i.e., 
a text message). The textual information can be sent or 
routed to the communication devices 201, 2 02, 2 03 and 
204, the content providers 208 and 209, the markup 

10 language servers, the voice browser, and the application 
server 242. The STT unit 256 is preferably a Naturally 
Speaking software package, available from Dragon 
Systems, 320 Nevada Street, Newton, MA 02160-9803. 

The VOIP unit 24 8 of the telecommunication node 212 

15 is preferably connected to the telephone switch 230 and 
the LAN 240. The VOIP unit 248 allows a user to access 
the node 212 via the internet 220 using voice commands. 
The VOIP unit 240 can receive VOIP protocols (i.e., 
H.323 protocols) transmitted over the internet 22 0 and 

20 can convert the VOIP protocols to speech information or 
data. The speech information can then be read to the 
user via the VRU client 232. The VOIP unit 248 can also 
receive speech inputs or communications from the user 
and convert the speech inputs to a VOIP protocol that 

25 can be transmitted over the internet 220. The VOIP unit 
248 is preferably a Voice Net software package, 
available from Dialogic Corporation. It will be 
recognized that the VOIP device can be incorporated into 
a communication device. 

30 The telecommunication node 212 also includes a 

detection unit 260. The detection unit 260 is preferably 
a phrase or key word spotter unit to detect incoming 
audio inputs or communications or DTMF tones from the 
user. The detector unit 2 60 is preferably incorporated 
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into the switch 230, but can be incorporated into the 
VRU client 232, the carrier switch 216, or the VRU 
server 256. The detection unit 260 is preferably 
included in a RecServer software package, available from 
5 Nuance Communications. 

The detection unit 260 records the audio inputs 
from the user and compares the audio inputs to the 
vocabulary or grammar stored in the database server unit 
244. The detector unit continuously monitors the user's 

10 audio inputs for a key phase or word after the user is 
connected to the node 212. When the key phrase or word 
is detected by the detection unit 260, the VRU client 
2 32 plays a pre-recorded message to the user. The VRU 
client 2 32 then responds to the audio inputs provided by 

15 the user. 

The billing server unit 238 of the communication 
node 212 is preferably connected to the LAN 240. The 
billing server unit 238 can record data about the use of 
the communication node by a user (i.e., length of calls, 

20 features accessed by the user, etc.). Upon completion 
of a call by a user, the call control unit 236 sends 
data to the billing server unit 238. The data can be 
subsequently processed by the billing server unit in 
order to prepare customer bills. The billing server unit 

25 238 can use the ANI or CLI of the communication device 
to properly bill the user. The billing server unit 238 
preferably comprises a Windows NT compatible computer. 

The gateway server unit 246 of the communication 
node 212 is preferably connected to the LAN 240 and the 

30 internet 220. The gateway server unit 246 provides 
access to the content provider 208 and the markup 
language server 257 via the internet 220. The gateway 
unit 246 also allows users to access the communication 
node 212 from the communication device 204 via the 
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internet 220. The gateway unit 246 can further function 
as a firewall to control access to the communication 
node 212 to authorized users. The gateway unit 246 is 
preferably a Cisco Router , available from Cisco Systems. 
5 The database server unit 244 of the communication 

node 212 is preferably connected to the LAN 240. The 
database server unit 244 preferably includes a plurality 
of storage areas to store data relating to users, speech 
vocabularies, dialogs, personalities, user entered data, 

10 and other information. Preferably, the database server 
unit 2 44 stores a personal file or address book. The 
personal address book can contain information required 
for the operation of the system, including user 
reference numbers, personal access codes, personal 

15 account information, contact's addresses, and phone 
numbers, etc. The database server unit 244 is 
preferably a computer, such as an NT Window compatible 
computer. 

The application server 242 of the communication 
20 node 212 is preferably connected to the LAN 2 40 and the 
content provider 209. The application server 242 allows 
the communication node 212 to access information from a 
destination of the information sources, such as the 
content providers and markup language servers . For 
25 example, the application server can retrieve information 
(i.e., weather reports, stock information, traffic 
reports, restaurants, flower shops, banks, etc.) from a 
destination of the information sources. The application 
server 2 42 processes the retrieved information and 
30 provides the information to the VRU server 234 and the 
voice browser 250. The VRU server 234 can provide an 
audio announcement to the user based upon the 
information using text-to-speech synthesizing or human 
recorded voice. The application server 242 can also 
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send tasks or requests (i.e., transactional information) 
received from the user to the information sources (i.e., 
a request to place an order for a pizza). The 
application server 242 can further receive user inputs 
5 from the VRU server 234 based upon a speech recognition 
output. The application server is preferably a 
computer, such as an NT Windows compatible computer. 

The markup language server 251 of the communication 
node 212 is preferably connected to the LAN 240. The 

10 markup language server 251 can include a database, 

scripts, and markup language documents or pages. The 
markup language server 251 is preferably a computer, 
such as an NT Window Compatible Computer. It will also 
be recognized that the markup language server 251 can be 

15 an internet server (i.e., a Sun Microsystems server). 

The paging server 252 of the communication node 212 
is preferably connected to the LAN 240 and the paging 
network 211. The paging server 252 routes pages between 
the LAN 24 0 and the paging network. The paging server 

20 252 is preferably a computer, such as a NT compatible 
computer . 

The voice browser 250 of the system 200 is 
preferably connected to the LAN 240. The voice browser 
250 preferably receives information from the information 

25 sources, such as the content provider 209 via the 

application server 242, the markup language servers 251 
and 257, the database 244, and the content provider 208. 
In response to voice inputs from the user or DTMF tones, 
the voice browser 250 generates a content request (i.e., 

30 an electronic address) to navigate to a destination of 
one or more of the information sources. The content 
request can use at least a portion of a URL, a URN, an 
IP, a page request, or an electronic email. 
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After the voice browser is connected to an 
information source, the voice browser preferably uses a 
TCP/IP connect to pass requests to the information 
source. The information source responds to the 
5 requests, sending at least a portion of the requested 
information, represented in electronic form, to the 
voice browser. The information can be stored in a 
database of the information source and can include text 
content, markup language document or pages, non-text 

10 content, dialogs, audio sample data, recognition 
grammars, etc. The voice browser then parses and 
interprets the information as further described below. 
It will be recognized that the voice browser can be 
integrated into the communication devices 201, 202, 203, 

15 and 20 4. 

As shown in FIG. 3, the content provider 2 09 is 
connected to the application server 244 of the 
communication node 212, and the content provider 2 08 is 
connected to the gateway server 2 46 of the communication 

20 node 212 via the internet 220. The content providers 
can store various content information, such as news, 
weather, traffic conditions, etc. The content providers 
208 and 2 09 can include a server to operate web pages or 
documents in the form of a markup language. The content 

25 providers 208 and 209 can also include a database, 

scripts, and/or markup language documents or pages. The 
scripts can include images, audio, grammars, computer 
programs, etc. The content providers execute suitable 
server software to send requested information to the 

30 voice browser. 

Referring now to FIG. 4, a block diagram of the 
voice browser 25 0 of the communication node 212 is 
illustrated. The voice browser 250 generally includes a 
network f etcher unit 300, a parser unit 302, an 
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interpreter unit 304, and a state machine unit 306. 
Although the voice browser is shown as being constructed 
with various types of independent and separate units or 
devices, it will be recognized that the voice browser 
5 25 0 can be carried out in the form of hardware 

components and circuit designs, software or computer 
programming, or a combination thereof. 

The network fetcher 300 of the voice browser 250 is 
connected to the parser 302 and the interpreter 304, 

10 The network fetcher 300 is also connected to the LAN 240 
of the communication node 212. The network fetcher unit 
304 retrieves information, including markup language 
documents, audio samples and grammars from the 
information sources . 

15 The parser unit 302 of the voice browser 250 is 

connected to the network fetcher unit 30 0 and the state 
machine unit 306. The parser unit 302 receives the 
information from the network fetcher unit 300 and parses 
the information according to the syntax rules of the 

20 markup language as further described below (i.e., 

extensible markup language syntax). The parser unit 302 
generates a tree or heirarchial structure representing 
the markup language that is stored in memory of the 
state machine unit 306. A tree structure of an 

25 exemplary markup language document is shown in FIG. 7. 

The following text defines the syntax and grammar 
that the parser unit of the voice browser utilizes to 
build a tree structure of the markup language document. 



30 < i ELEMENT dialog ( step | class ) *> 

< i ATTLIST dialog bargein (Y|N) "Y"> 
<! ELEMENT step 
( prompt | input | help | error | cancel | ack ) *> 
<! ATTLIST step name ID #REQUIRED 
35 parent IDREF #IMPLIED 

bargein (Y|N) "Y" 
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cost CDATA #IMPLIED> 
<! ELEMENT class ( prompt | help | error | cancel | ack ) *> 
< I ATTLIST class name ID #REQUIRED 

parent IDREF #IMPLIED 
5 bargein (Y|N) "Y" 

cost CDATA #IMPLIED> 

<! ELEMENT prompt 
(#PCDATA | options | value | emp | break | pros | audio ) *> 

< ! ELEMENT emp 
10 (#PCDATA | options | value | emp | break | pros | audio ) *> 

<! ATTLIST emp level ( strong j moderate | none j reduced ) 
"moderated 

< i ELEMENT pros 

(#PCDATA | options | value | emp | break | pros | audio )*> 
15 <! ATTLIST pros rate CDATA #IMPLIED 

vol CDATA #IMPLIED 
pitch CDATA #IMPLIED 
range CDATA #IMPLIED> 

<! ELEMENT help 
20 ( #P CDATA | options | value | emp | break | pros | audio ) *> 

<! ATTLIST help ordinal CDATA #IMPLIED 

reprompt (Y|N) "N m 
next CDATA #IMPLIED 
nextmethod ( get | post) "get"> 

25 <! ELEMENT error 

(#PCDATA | options | value | emp | break | pros | audio) *> 
<! ATTLIST error type NMTOKENS "ALL" 

ordinal CDATA #IMPLIED 
reprompt (Y|N) "N" 
30 next CDATA #IMPLIED 

nextmethod ( get | post) "get"> 

<! ELEMENT cancel 
( #P CDATA | value | emp | break | pros | audio ) *> 

<! ATTLIST cancel next CDATA #REQUIRED 
35 nextmethod ( get | post) "get"> 

<! ELEMENT audio EMPTY> 

< i ATTLIST audio src CDATA #REQUIRED> 
<! ELEMENT ack 

(#PCDATA | options | value | emp | break | pros j audio ) *> 
40 <1 ATTLIST ack confirm NMTOKEN "YORN" 

background (Y|N) "N" 
reprompt (Y|N) "N m > 
<! ELEMENT input 
( option | response | rename | switch | case ) *> 
45 < I ATTLIST input type 

( none | optionlist | record | grammar | profile | hidden | 

yorn | digits j number j time | date | money j phone ) #REQUIRED 

name ID #IMPLIED 
next CDATA #IMPLIED 
50 nextmethod ( get | post) "get" 

timeout CDATA #IMPLIED 
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min CDATA #IMPLIED 
max CDATA #IMPLIED 
profname NMTOKEN #IMPLIED 
subtype NMTOKEN #IMPLIED 
5 src CDATA #IMPLIED 

value CDATA #IMPLIED 
msecs CDATA #IMPLIED 
storage ( file | request ) #REQUIRED 
format CDATA #IMPLIED> 
10 < I ELEMENT switch ( case | switch ) *> 

< i ATTLIST switch field NMTOKEN #REQUIRED> 

< i ELEMENT response ( switch )*> 

<! ATTLIST response next CDATA #IMPLIED 

nextmethod ( get | post) "get" 
15 fields NMTOKENS #REQUIRED> 

<! ELEMENT rename EMPTY> 

<! ATTLIST rename varname NMTOKEN #REQUIRED 

recname NMTOKEN #REQUIRED> 
< I ELEMENT case EMPTY> 
20 <! ATTLIST case value CDATA #REQUIRED 

next CDATA #REQUIRED 
nextmethod ( get | post) "get"> 
< I ELEMENT value EMPTY> 

<! ATTLIST value name NMTOKEN #REQUIRED> 
25 <! ELEMENT break EMPTY> 

<! ATTLIST break msecs CDATA #IMPLIED> 

size (none | small | medium | large) 

"medium" > 

<! ELEMENT options EMPTY> 
30 <! ELEMENT or EMPTY> 

<! ELEMENT option ( #PCDATA | value | or ) *> 
<! ATTLIST option value CDATA #IMPLIED 

next CDATA #IMPLIED 
nextmethod ( get | post) "get M > 

35 

Referring again to FIG . 4, the interpreter unit 304 
of the voice browser 250 is connected to the state 
machine unit 306 and the network f etcher unit 300. The 
interpreter unit 304 is also connected to the LAN. The 
40 interpreter unit 304 carries out a dialog with the user 
based upon the tree structure representing a markup 
language document. The interpreter unit sends data to 
the TTS 252. The interpreter unit 304 can also receive 
data based upon inputs from the user via a VRU server 
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and can send outputs to the information source based 

upon the user inputs. 

The interpreter unit 304 can transition from state 

to state (i.e., step to step) within a tree structure 
5 (i.e., a dialog) of a markup language document or can 

transition to a new tree structure within the same 

dialog or another dialog. The interpreter unit 

determines the next state or step based upon the 

structure of the dialog and the inputs from the user. 
10 When the interpreter unit transitions to a new dialog or 

page, the address of the new dialog or page is then sent 

to the network f etcher. 

The state machine 306 of the voice browser 250 is 

connected to the parser unit 302 and the interpreter 
15 unit 304. The state machine 306 stores the tree 

structure of the markup language and maintains the 

current state or step that the voice browser is 

executing . 

FIGS. 5a-5c illustrate a flow diagram of a software 
20 routine executed by the voice browser 250 . The software 
routine allows interactive voice applications. At block 
400, the voice browser 250 determines an initial address 
(i.e., a URL) and a step element or name. The voice 
browser then fetches the contents (i.e., a markup or 
25 language document) of the current address from the 

information sources (i.e., content providers and markup 
language servers) at block 402. After the voice browser 
fetches the address, the voice browser processes the 
contents and builds a local step table (i.e., a tree 
30 structure) at block 404. 

At block 406, a prompt can be played to the user 
via the TTS unit of the system 2 00 for the current 
element. The voice browser then waits for an input from 
the user (i.e., speech or DTMF tones). At block 408, 
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the voice browser can collect input from the user for 
the current step element. FIG . 5c shows an exemplary 
flow diagram of a routine that is executed by the voice 
browser to determine the grammar for speech recognition. 
5 At block 502 , the voice browser determines whether 

a pre-determined grammar exists for the user input and 
the markup language. For example , the voice browser 
determines whether the grammar for the user input is 
found in a predetermined or pre-existing grammar stored 

10 in a database or contained in the markup language. If 
the grammar is found, the voice browser sends the 
grammar to the VRU server at block 504. At block 506, 
the VRU server compares the user input to the grammar to 
recognize the user input. After the VRU server 

15 recognizes the user input, the process proceeds to block 
410 (see FIG. 5a) as described below. 

If a pre-existing grammar is not found at block 
502, the voice browser dynamically generates the grammar 
for the user input. At block 508, the voice browser 

20 looks up the pronunciations for the user in a dictionary 
at block 508. The dictionary can be stored in a 
database of the system or stored on an external database 
(i.e., the voice browser can fetch a dictionary from the 
processor or from the internet). 

25 At block 510, the voice browser generates the 

grammar for the user inputs based upon the 
pronunciations from the dictionary and phonetic rules. 
A software routine available from Nuance Communication, 
Model No. RecServer, can be used to generate the 

30 grammar. At block 512, the grammar is sent to the VRU 
server. The voice browser then attempts to match the 
grammar to the user input at block 506. 

After the voice browser detects or collects an 
input from the user at block 408, the voice browser 



WO 00/05643 



-29- 



PCT/US99/16777 



determines whether there is an error at block 410. If 
the voice browser is having difficulty recognizing 
inputs from the user or detects a recognition error, a 
timeout error, etc., an appropriate error message is 
5 played to the user at block 414. For example, if the 

voice browser detected too much speech from the user or 
the recognition is too slow, a prompt is played (i.e., 
"Sorry, I didn't understand you") to the user via the 
VRU server. If the voice browser receives unexpected 

10 DTMF tones, a prompt is played (i.e., "I heard tones. 
Please speak your response") to the user via the VRU 
server. if the voice browser does not detect any speech 
from the user, a prompt is read to the user (i.e., "I am 
having difficulty hearing you"). 

15 At block 416, the voice browser determines whether 

a re-prompt was specified in the error response or 
element. If a re-prompt is to be played to the user at 
block 416, the process proceeds to block 406 as 
described above. If a re-prompt is not to be played to 

20 the user at block 416, the voice browser determines 
whether there is a next step element specified in the 
error response at block 420. If another step element is 
specified in the error response at block 420, the 
process proceed to block 402 as described above. If 

25 another step element is not specific in the error 

response at block 42 0, the process proceeds to block 
422. 

If the voice browser does not detect a recognition 
error at block 410, the voice browser determines whether 
30 the user requested help at block 412. If the user 

requested help, an appropriate help response is played 
to the user (i.e., "please enter or speak your pin") at 
block 424. 
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At block 425 , the voice browser determines whether 
a re-prompt was specified in the help response or step. 
If a re-prompt is specified in the help response at 
block 425, the process proceeds to block 406 as 
5 described above. If a re-prompt is not specified in the 
help response at block 425, the voice browser determines 
whether a next step element is specified in the help 
response at block 426. If another step element is 
specified in the help response at block 426, the process 

10 proceeds to block 402 as described above. If another 
step element is not specific in the help response at 
block 426, the process proceeds to block 428. 

At block 430, the voice browser determines whether 
a cancel request has been indicated by the user. If the 

15 voice browser detects a cancel request from the user at 
block 430, an appropriate cancel message is played to 
the user at block 434 (i.e.,"Do you wish to exit and 
return to the Main Menu?"). 

At block 436, the voice browser then determines 

20 whether there a next step element is specified in the 
cancel response or element. If another step element is 
specified in the cancel response at block 436, the 
process proceeds to block 448. If another step element 
is not specified in the error response at block 436, the 

25 process proceeds to block 422. 

If a cancel request was not detected at block 430, 
the voice browser determines the next step element at 
block 432. At block 440, the voice browser determines 
whether there is an acknowledgement specified in the 

30 next step element. If there is no acknowledgement 

specified in the step element at block 440, the voice 
browser sets the current step element to the next step 
element at block 442 and then determines whether the 
next step element is within the same page at block 444. 
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If the next step element is within the same page as 
the current step element at block 444, the process 
proceeds to block 446. If the next step element is not 
within the same page as the current page at block 444, 
5 the process proceeds to block 448. 

If an acknowledgement is specified in the next step 
element at block 440, an acknowledgement response is 
played to the user at block 450. The voice browser then 
determines whether a confirmation is specified in the 

10 information (i.e., a markup language document) at block 
452. If a confirmation is not specified in the 
information at block 452, the process proceeds to block 
442 as described above. If a confirmation is specified 
at block 452, the voice browser determines whether the 

15 response was recognized from the user a block 454 and 
then determines whether the response is affirmative at 
block 456. 

If the voice browser receives an affirmative 
response at block 456, the process proceeds to block 442 

20 as described above. If the voice browser does not 

receive an affirmative response from the user at block 
456, the process proceeds to block 448. 

The following text describes an exemplary markup 
language processed by the voice browser of the 

25 communication node 212. The markup language preferably 
includes text, recorded sound samples, navigational 
controls, and input controls for voice applications as 
further described below. The markup language enables 
system designers or developers of service or content 

30 providers to create application programs for instructing 
the voice browser to provide a desired user interactive 
voice service. The markup language also enables 
designers to dynamically customize their content. For 
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example, designers can provide up-to-date news, weather, 
traffic, etc. 

The markup language can be designed to express flow 
of control, state management, and the content of 
5 information flow between the communication node 212 and 
the user. The structure of the language can be designed 
specifically for voice applications and the markup 
language is preferably designed and delivered in units 
of dialog. 

10 The markup language can include elements that 

describe the structure of a document or page, provide 
pronunciation of words and phrases, and place markers in 
the text to control interactive voice services. The 
markup language also provides elements that control 

15 phrasing, emphasis, pitch, speaking rate, and other 
characteristics. The markup language documents are 
preferably stored on databases of the information 
sources, such as the content providers 208 and 209 and 
the markup language servers 251 and 257. 

20 FIG. 6 illustrates an exemplary markup language 

document that the voice browser of the communication 
node can process. The markup language document has a 
hierarchical structure, in which every element (except 
the dialog element) is contained by another element. 

25 Elements between another elements are defined to be 
children or a lower element of the tree. FIG. 7 
illustrates a tree stucture of the markup language 
document of FIG. 6. 

As shown in FIG. 6, the markup language document 

30 includes tags, denoted by <> symbols, with the actual 
element between the brackets. The markup language 
includes start tags ("< >") and end tags ("</ >"). A 
start tag begins a markup element and the end tags ends 
the corresponding markup element. For example, in the 
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markup language document as shown in FIG. 6, the DIALOG 
element (<dialog>) on line 2 begins a markup language 
document or page, and the dialog element (<dialog>) on 
line 2 6 indicates the markup language document has 
5 ended. The elements often have attributes which are 
assigned values as further described below. 

The DIALOG element and STEP elements of a markup 
language document provide the basic structure of the 
document. The DIALOG element defines the scope of the 

10 markup language document, and all other elements are 
contained by the DIALOG element. The STEP elements 
define states within a DIALOG element (i.e., the STEP 
element defines an application state). For example, an 
application state can include initial prompts, help 

15 messages, error messages, or cleanup and exit 
procedures . 

The DIALOG element and the associated STEP elements 
of a markup language document define a state machine 
that represents an interactive dialogue between the 
20 voice browser and the user. When the voice browser 
interprets the markup language document, the voice 
browser will navigate through the DIALOG element to 
different STEP elements as a result of the user's 
responses . 

25 The following example illustrates an exemplary 

markup language document that the voice browser of the 

communication node can process. The example has one 

DIALOG element and two STEP elements. 

<?XML VERS ION= " 1.0"? > 
30 <DIALOG> 

<STEP NAME= " init " > 

<PROMPT> Please select a soft drink. 

</PROMPT> 

<HELP> Your choices are coke, pepsi, 7 up, 
35 or root beer. </HELP> 

<INPUT TYPE="optionlist" NAME= M drink"> 
<OPTION NEXT="#conf irm"> coke </OPTION> 
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<OPTION NEXT="#confirm M > pepsi </OPTION> 
<OPTION NEXT="#confirm ,, > 7 up </OPTION> 
<OPTION NEXT="#conf irm"> root beer </OPTION> 
</INPUT> 
5 </STEP> 

<STEP NAME= " confirm 11 > 

<PROMPT> You ordered a <VALUE NAME= " drink "/> . 
</PROMPT> 
</STEP> 
10 </DIALOG> 

When the above markup language document is 
interpreted by the voice browser, the voice browser 
initially executes the STEP element called "init". 

15 First, the user will hear the text contained by the 
prompt element (i.e., "Please select a soft drink."). 
If the user responds "help" before making a selection, 
the user would hear the text contained with the HELP 
element (i.e., "Your choices are coke, pepsi, 7up, or 

20 root beer."). After the user makes a selection, the 
voice browser will execute the STEP element named 
"confirm", which will read back the user's selection and 
then exit the application. It is noted that the STEP 
elements in a markup language document are executed 

25 based on the user's responses not on the order of the 
STEP elements within the source file. Although the 
definition of the "init" STEP element appears before and 
the definition of the "confirm" STEP element, the order 
in which they are defined has no impact on the order in 

30 which the voice browser navigates through them. 

The following text describes the markup language 
elements, their attributes, and their syntax. The 
DIALOG element of the markup language (i.e., <DIALOG 
[BARGEIN=" value" ] > markup language document </DIALOG>) 

35 is the fundamental element of the markup language. The 
DIALOG element includes a BARGE IN attribute. The value 
of the BARGE IN attribute can be "Y" and "N" . The 
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BARGE IN attribute allows the DIALOG element to be 
interrupted at any time based upon a predetermined 
response from the user (i.e., wake up). 

The DIALOG element defines the basic unit of 
5 context within an application, and typically, there is 
one DIALOG element per address (i.e., URL). Each DIALOG 
element contains one STEP element named "init". The 
execution of the DIALOG element begins with the STEP 
named "init". 

10 The following example of a markup language document 

or page contains the DIALOG element. 

<DIALOG> 

<STEP NAME="init"> 

<PROMPT> Welcome to VoxML™ voice markup 
15 language. </PROMPT> 

</STEP> 
</DIALOG> 

In the example above, the DIALOG element contains a 

20 single STEP element named "init". The STEP element has 
a single PROMPT element that will be read to the user 
via the text-to-speech unit 252. Since there is no 
INPUT element defined in the STEP element, the markup 
language application will terminate immediately after 

25 the PROMPT element is read. 

The STEP element of the markup language (i.e., 
<STEP NAME- "value" [PARENT=" value " ] [ BARGE IN= " value " ] 
[ COST=" value " ] > text </STEP>) defines a state in a 
markup language document or page. The STEP element is 

30 contained by a DIALOG element. The STEP element 

includes a NAME attribute, a PARENT attribute, a BARGEIN 
attribute, and a COST attribute. The value of the NAME 
and PARENT attribute can be an identifier (i.e., a 
pointer or a variable name), the value of the BARGEIN 

35 attribute can be "Y" and "N", and the value of the COST 
attribute can be an integer. 
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The STEP element typically has an associated PROMPT 

element and INPUT element that define the application 

state. The following example illustrates the use of the 

STEP element in a markup language document. 

5 <STEP NAME="askpython n PARENT= " tvr at ing " > 

<PROMPT> Please rate Monty Python's Flying 
Circus 

on a scale of 1 to 10. </PROMPT> 
< INPUT NAME=" python" TYPE= " number M 
10 NEXT="#drwho M /> 
</STEP> 

The example shown above illustrates a STEP element 
that collects the user's opinion on one of several 

15 public television shows. The STEP element uses the 

PARENT attribute to share a common set of help and error 
elements with other TV-show-rating STEP elements. For 
example , the PARENT attribute can contain a HELP element 
explaining what a rating of 1, 5, and 10 would mean, and 

20 a common error message can remind the user that a 
numeric rating is expected. 

The PROMPT element of the markup language (i.e., 
<PROMPT> text </PROMPT>) is used to define content 
(i.e., text or an audio file) that is to be presented to 

25 the user. Typically, the PROMPT element will contain 

text and several markup elements (i.e., the BREAK or EMP 
elements as described below) that are read to the user 
via the text-to-speech unit. 

The PROMPT element can be contained within a STEP 

30 or a CLASS element. The following example illustrates 

the use of the PROMPT element in markup language 

document or page . 

<STEP NAME="init n > 

<PROMPT> How old are you? </PROMPT> 
35 <INPUT TYPE=" number" NAME= " age 11 

NEXT="#weight"/> 
</STEP> 
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In the example shown above, the text "How old are 
you?" will be played to the user via the text-to-speech 
unit, and then the voice browser will wait for the user 
to say his or her age. 
5 The INPUT element of the markup language is used to 

define a valid user input within each STEP element. The 
INPUT element is contained within a STEP element. The 
INPUT element of the markup language includes an INPUT 
attribute. The value of the INPUT attribute can be a 
10 DATE input, a DIGIT input, a FORM input, a GRAMMAR 

input, a HIDDEN input, a MONEY input, a NONE element, a 
NUMBER input, an OPTIONLIST input, a PHONE input, a 
PROFILE input, a RECORD input, a TIME input, and a YORN 
element. 

15 The DATE input of the INPUT attribute of the markup 

language (i.e., <INPUT TYPE= 11 DATE " NAME= " value " 
NEXT=" value" [NEXTMETHOD=" value" ] [TIsMEOUT=" value " ] />) 
is used to collect a calendar date from the user. The 
DATE input includes a NAME attribute, a NEXT attribute, 

20 a NEXTMETHOD attribute, and a TIMEOUT attribute. The 
value of the NAME attribute can be an identifier, and 
the value of the NEXT attribute can be the next STEP 
address (i.e., a URL). The value of the NEXTMETHOD 
attribute can be a get and a post (i.e., an input into a 

25 Java Script program or a markup language server), and 

the value of the TIMEOUT attribute can be a number 

represented in milliseconds. 

The following example illustrates the use of the 

DATE input in a markup language document. 

30 <STEP NAME="init"> 

<PROMPT> What is your date of birth? <PROMPT> 
<INPUT TYPE="date" NAME= " dob " NEXT="#soc " /> 
</STEP> 

In the example above, the DATE input is used to 
35 gather the user's birthday, store it in a variable 
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"dob", and then go to the STEP element named "soc". The 
DATE input makes use of an input grammar to interpret 
the user's response and store that response in a 
standard format. 
5 The DATE input grammar can interpret dates 

expressed in several different formats. A fully defined 
date, such as, "next Friday, July 10 th , 1998" is stored 
as "07101998 | July | 10 | 1998 | Friday | next". If the date 
cannot be determined by the user's response, the 

10 ambiguous parts of the response will be omitted from the 
data. The response "July 4 th ", is stored as 
"???????? | July | 4 | ||", "Tomorrow" becomes 
"???????? | | | | [tomorrow", "The 15 th " is stored as 
"????????| |15| | |", and "Monday" becomes 

15 "???????? | | | (Monday | ". 

The DIGITS input of the INPUT attribute of the 
markup language (i.e., <INPUT TYPE= "DIGITS " NAME= " value " 
NEXT= " value " [ NEXTMETHOD= " value " ] [ TIMEOUT= " value " ] 
[ MIN= M value " ] [ MAX= M value " ] />) is used to collect a 

20 series of digits from the user. The DIGITS input 
includes a NAME attribute, a NEXT attribute, a 
NEXTMETHOD attribute, a TIMEOUT attribute, a MIN 
attribute, and a MAX attribute. The value of the NAME 
attribute can be an identifier, the value of the NEXT 

25 attribute can be a next step address (i.e., a URL), the 
value of the NEXTMETHOD attribute can be a get and a 
post, and the value of the TIMEOUT attribute can be a 
number represented in milliseconds. The value of the 
MIN and MAX attributes can be minimum and maximum 

30 integer values, respectively. 

The following example illustrates the use the 

DIGITS input in a markup language document or page. 

<STEP NAME= " init " > 

<PROMPT> Please say your pin now. </PROMPT> 
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<INPUT TYPE=" digits" NAME = "pin" 
NEXT="#doit"/> 
</STEP> 

5 In the example above, the DIGITS input is used to 

collect digits from the user, store the number in the a 
variable named "pin", and then go to the STEP named 
"doit". If the user were to speak, "four five six", in 
response to the PROMPT element, the value "456" would be 

10 stored in the variable "pin". The DIGITS input can 

collect the digits 0 (zero) through 9 (nine), but not 
other numbers like 20 (twenty). To collect double-digit 
numbers (i.e., 2 0 (twenty) or 4 00 (four-hundred), the 
NUMBER input can be used as further described below. 

15 The FORM input of INPUT attribute of the markup 

language (i.e., <INPUT TYPE="FORM 11 NAME= " value " 
MEHOD=" value" ACTION=" value" TIMEOUT=" value " /> is used 
to collect input from the user, convert the input to 
text using the speech to text unit, and send the text to 

20 the markup language server. The FORM input includes a 

NAME attribute, a NEXT attribute, a METHOD attribute, an 
ACTION attribute and a TIMEOUT attribute. The value of 
the NAME attribute can be an identifier, and the value 
of the NEXT attribute can be a next step address (i.e., 

25 a URL, pointer or mamory address). The value of the 

METHOD attribute can be a get or a post, and the value 
of the ACTION attribute is a pointer to a script that 
processes the input on the server. The value of the 
TIMEOUT attribute can be a number represented in 

30 milliseconds. 

The FORM input makes use of the speech to text unit 
to convert user input to text. The user input is then 
sent to the markup language server in a standard HTML 
<FORM> text format to be processed by a script on the 

35 server. If the user said "John Smith" then the text 
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string M john smith" would be sent to the server using 

the pointer and address indicated by the ACTION 

attribute using the method indicated by the METHOD 

attribute in a <FORM> format. 

5 The following is an example of the use of the FORM 

input in a markup language document. 

<STEP NAME=" order form"> 

<PROMPT> What you like to order? </PROMPT> 
<INPUT TYPE = " form " NAME=" order" NEXT="#next 
10 order" METHOD="post " 

ACTION^ " http ; / /www . test . com/cqi-bin/post- 

query " 

TIMEOUT="2 00" /> 
</STEP> 

15 

In the example shown above, the FORM input is used 
to collect an order input from the user, store the user 
input converted to text in the variable named "order", 
go to the next step named "next order", post the text to 
20 the address " http: //www. test . com/cqi-bin/post-query " , 
and use a timeout value of 200 milliseconds. 

The GRAMMAR input of the of the INPUT attribute of 
the markup language (i.e., <INPUT TYPE=" GRAMMAR" 

25 SRC= " value " NEXT= " value " [ NEXTMETHOD= " value " ] 

[ TIMEOUT=" value " ] />, <INPUT TYPE=" GRAMMAR" SRC="value" 
NEXT= " value " [ NEXTMETHOD= " value " ] [ TIMEOUT= " value " ] > 
RENAME elements </INPUT>, or < INPUT TYPE=" GRAMMAR" 
SRC= " value " [ TIMEOUT= " value " ] [ NEXT- " value " 

30 [ NEXTMETHOD^" value " ] ] > RESPONSE elements </INPUT>) is 
used to specify an input grammar when interpreting the 
user's responses. The GRAMMAR input includes a SCR 
attribute, a NEXT attribute, a NEXTMETHOD attribute, and 
a TIMEOUT attribute. The value of the SCR attribute can 

35 be a grammar address (i.e., a URL), and the value of the 
NEXT attribute can be a next step address (i.e., a URL). 
The value of the NEXTMETHOD attribute can be a get and a 
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post, and the value of the TIMEOUT attribute can be a 

number represented in milliseconds. 

The following example illustrates the use of the 

GRAMMAR input in a markup language document. 

5 <STEP NAME= " init " > 

<PROMPT> Say the month and year in which the 

credit card expires. </PROMPT> 
<INPUT TYPE = " GRAMMAR " 

SRC="gram: // . SomeGrammar /month/year " 
10 NEXT="#stepNineteen" /> 

</STEP> 

The above example illustrates the use of the 
GRAMMAR input to generate a predetermined grammar 

15 corresponding to a month and year from the user, store 
the interpreted values in variables named "month" and 
"year", and then go to the step named "stepNineteen" . 

The HIDDEN input of the INPUT attribute of the 
markup language (i.e., <INPUT TYPE=" HIDDEN" NAME=" value " 

20 VALUE= " value "/>) is used to store a value in a variable. 
The HIDDEN input includes a NAME attribute and a VALUE 
attribute. The value of the NAME attribute can be an 
identifier, and the value of the VALUE attribute can be 
a literal value. 

25 The following example illustrates the use of the 

HIDDEN input in a markup language document. 

<STEP NAME= " init " > 

<PROMPT> Login sequence complete. 

Are you ready to place your order? 
30 </PROMPT> 

<INPUT TYPE= " hidden " NAME="f irstname" 

VALUE="Bill" /> 
<INPUT TYPE=" hidden" NAME=" lastname" 
VALUE= "Clinton " /> 
35 <INPUT TYPE=" hidden" NAME=" favorite" 

VALUE=" fries "/> 
<INPUT TYPE="optionlist"> 

<OPTION NEXT="#order"> yes </OPTION> 
<OPTION NEXT="#wait"> not yet </OPTION> 
40 </INPUT> 
</STEP> 
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In the example shown above, the HIDDEN input is 
used to create variables and assign values to those 
variables. In this example, the user has completed the 
5 login sequence and certain information is stored in 
variables as soon as the user's identity has been 
established. This information could then be used later 
in the application without requiring another access into 
the database. 

10 The MONEY input of the INPUT attribute of the 

markup language (i.e., <INPUT TYPE= "MONEY " NAME= " value " 
NEXT=" value" [ NEXTMETHOD= " value " ] [ TIMEOUT= " value " ] />) 
is used to collect monetary amounts from the user. The 
MONEY input includes a NAME attribute, a NEXT attribute, 

15 a NEXTMETHOD attribute, and a TIMEOUT attribute. The 
value of the NAME attribute can be an identifier, and 
the value of the NEXT attribute can be a next step 
address (i.e., a URL). The value of the NEXTMEHOD 
attribute can be a get and a post, and the value of the 

20 TIMEOUT attribute can be a number represented in 
milliseconds . 

The MONEY input makes use of an input grammar to 
interpret the user's response and store that response in 
a standard format. The input grammar is able to 

25 interpret various ways to express monetary amounts. The 
data is preferably stored in integer format, in terms of 
cents. "Five cents" is stored as "5", "five dollars" is 
stored as "500", and "a thousand" is stored as "100000". 
In the case where the units are ambiguous, the grammar 

30 assumes dollars, in which "a thousand" is stored as if 
the user had said "a thousand dollars". 

The following example illustrates the use of the 
MONEY input in a markup language document. 
<STEP NAME="init"> 
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<PROMPT> How much would you like to deposit? 

</PROMPT> 
<INPUT TYPE= ,, money n NAME="dep" 
NEXT="#deposit " /> 
5 </STEP> 

The example shown above, the MONEY input is used to 
collect the amount of money that the user would like to 
deposit in his account, store that amount in a variable 

10 named "dep", and then go to the STEP named ''deposit". 

The NONE input of the INPUT attribute of the markup 
language (i.e., <INPUT TYPE= " NONE " NEXT= " value " 
[ NEXTMETHOD=" value " ] />) is used to specify the next 
location for the voice browser to go to continue 

15 execution when no response is collected from the user. 
The NONE input includes a NEXT attribute and a 
NEXTMETHOD attribute. The value of the NEXT attribute 
can be a next step address (i.e., a URL), and the value 
of the NEXTMETHOD attribute can be a get and a post. 

20 The following example illustrates the use of the 

NONE input in a markup language. 

<STEP NAME= " init 11 > 

<PROMPT> Welcome to the system. </PROMPT> 
<INPUT TYPE="none" NEXT="#mainmenu" /> 
25 </STEP> 

In the example shown above, the NONE input is used 
to jump to another STEP element in this dialog without 
waiting for any user response. In this example, the 

30 user would hear the phrase "Welcome to the system" 
followed immediately by the prompt of the main menu. 

The NUMBER input of INPUT attribute of the markup 
language (i.e., <INPUT TYPE= "NUMBER" NAME=" value" 
NEXT= " value " [ NEXTMETHOD= " value " ] [ TIMEOUT= " value " ] /> ) 

35 is used to collect numbers from the user. The NUMBER 
input includes a NAME attribute, a NEXT attribute, a 
NEXTMETHOD attribute, and a TIMEOUT attribute. The 
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value of the NAME attribute can be an identifier, and 

the value of the NEXT attribute can be a next step 

address (i.e., a URL). The value of the NEXTMETHOD 

attribute can be a get and a post, and the value of the 

5 TIMEOUT attribute can be a number represented in 

milliseconds . 

The following example illustrates the use of the 

NUMBER input in a markup language document or page. 

<STEP NAME= " init " > 
10 <PROMPT> Please say your age now. </PROMPT> 

< INPUT TYPE= " number " NAME= " age 11 
NEXT= n #doit M /> 
</STEP> 

15 In the example shown above, the NUMBER input is 

used to collect numbers from the user, store the number 
in a variable named "age", and then go to the STEP 
element named "doit". If the user were to say, 
"eighteen", in response to the PROMPT element, the value 

20 "18" would be stored in the variable "age". The NUMBER 
input will collect numbers like 20 (i.e. twenty), but 
only one number per input. To collect a series of 
digits like "four five six" (i.e. "456"), the DIGITS 
input can be used as described above. 

25 The OPTIONLIST input of INPUT attribute of the 

markup language (i.e., <INPUT TYPE=" OPTIONLIST" 
[ NAME= " value " ] [ TIME0UT= " value " ] [ NEXT= " value " 
[NEXTMETHOD=" value " ] ] > OPTION elements </INPUT>) is 
used to specify a list of options from which the user 

30 can select. The OPTIONLIST input includes a NAME 

attribute, a NEXT attribute, a NEXTMETHOD attribute, and 
a TIMEOUT attribute. The value of the NAME attribute 
can be an identifier, and the value of the NEXT 
attribute can be a next step URL. The value of the 

35 NEXTMETHOD attribute can be a get and a post, and the 
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value of the TIMEOUT attribute can be a number 

represented in milliseconds. 

The OPTIONLIST input is used in conjunction with 

the OPTION element, which defines the specific user 

5 responses and the behavior associated with each OPTION 

element. The following example illustrates the use of 

the OPTIONLIST element in a markup language document. 

<STEP NAME== " ini t " > 

<PROMPT> What would you like to drink? 

10 </PROMPT> 

<INPUT TYPE= M optionlist"> 

<OPTION NEXT="#coke"> coke </OPTION> 
<OPTION NEXT="#coke"> coca-cola </OPTION> 
<OPTION NEXT= M #pepsi"> pepsi </OPTION> 
15 <OPTION NEXT= " #rc " > r c </OPTION 

</INPUT> 
</STEP> 

In the example shown above, the voice browser will 

20 go to a different STEP element or state depending on 

which cola the user selects. If the user said "coke" or 
"coca-cola", the voice browser would go to the STEP 
element named "coke". 

The PHONE input of INPUT attribute of the markup 

25 language (i.e., <INPUT TYPE= " PHONE 11 NAME= " value " 

NEXT= " value " [ NEXTMETHOD= " value " ] [ TIMEOUT= " value " ] /> ) 
is used to collect telephone numbers from the user. The 
PHONE input includes a NAME attribute, a NEXT attribute, 
a NEXTMETHOD attribute, and a TIMEOUT attribute. The 

30 value of the NAME attribute can be an identifier, and 
the value of the NEXT attribute can be a next step 
address (i.e., a URL). The value of the NEXTMETHOD 
attribute can be a get and a post, and the value of the 
TIMEOUT attribute can be a number represented in 

35 milliseconds. 

The PHONE input makes use of an input grammar to 
interpret the user's response and store that response in 
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a standard format. The phone number is interpreted as a 

string of digits and stored in a variable. If a user 

said "One, eight zero zero, seven five nine, eight eight 

eight eight", the response would be stored as 

5 "18007598888". 

The following is an example of the use of the PHONE 

input in a markup language document. 

<STEP NAME=" phone "> 

<PROMPT> What is your phone number? </PROMPT> 
10 <INPUT TYPE="phone" NAME="ph" NEXT="#f ax" /> 

</STEP> 

In this example shown above, the PHONE input is 
used to collect a telephone number from the user, store 

15 the number in the variable named "ph", and go to the 
STEP named "fax". 

The PROFILE input of INPUT attribute of the markup 
language (i.e., <INPUT TYPE= " PROFILE " NAME= "value " 
PROFNAME=" value" [ SUBTYPE= "value " ] />) is used to 

20 collect the user's profile information (i.e, first name, 
last name, mailing address, email address, and 
notification address). The user profile information is 
stored in the database 244 of the system. 

The PROFILE input includes a NAME attribute, a 

25 PROFNAME attribute, and a SUBTYPE attribute. The value 
of the NAME attribute can be an identifier, the value of 
the PROFNAME attribute can be a profile element name 
(string), and the value of the SUBTYPE attribute can be 
profile element subtype (string). 

30 The following example illustrates the use of the 

PROFILE input in a markup language document. 

<STEP NAME="getinfo"> 

<INPUT TYPE=" profile" NAME=" f irstname " 
PROFNAME = " N " SUBTYPE= " first " /> 
35 <PROMPT> Hello, <VALUE NAME=" f irstname "/> . 

Please say your pin. </PROMPT> 
<INPUT TYPE="digits" NAME= "pin" 
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NEXT= M #verify" /> 
</STEP> 

In the example above, the PROFILE input is used to 
5 retrieve the user's first name and store the string in a 
variable named "f irstname" . The string containing the 
name is then inserted into the PROMPT element using a 
VALUE element as further described below. When using 
the PROFILE input, more than one INPUT element can be 
10 included in the same STEP element because the PROFILE 
input is not an interactive INPUT element. Each STEP 
element contains only one INPUT element that accepts a 
response from the user. 

The following table lists the valid combinations of 
15 profile names and their associated subtypes 



Profile Name 


Subtype 


Description 


ADR 


POSTAL 


postal address 




PARCEL 


parcel address 




HOME 


home address 




WORK 


work address 




DOM 


domestic address 


address 


INTL 


international 


BDAY 


none 


birthday 


EMAIL 
address 


none 


primary email 


email address 


NOTIFICATION notification 


FN 


none 


formatted name 


GEO 

( longitude ; lattitude ) 


none 


geographic location 


KEY 
key 


none 


public encryption 


LABEL 


none 


mailing label 
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5 



30 



MAILER 


none 


email program used. 


N 


FIRST 


first name 




LAST 


last name 




MIDDLE 


middle name 


Mrs . , Dr. ) 


PREFIX 


prefix (e.g. Mr. , 


D.D.S, M.D.) 


SUFFIX 


suffix (e.g. Jr . , 


ORG 


none 


organization 


"ROT.F 

IVV/JjJj 

position 


none 


job role or 


TFT. 
± lit j_i 

number 


HOME 


home telephone 


number 


rjAD XT 

WIJKJS. 


work telephone 


number 


TUT O O 


voicemail telephone 


telephone number 


VOICE 


voice call 


number 


FAX 


fax call telpnhnnp 


number 


CELL 


cellular telephone 


number 


PREF 


preferred telephone 


TITLE 


none 


iob title 


TZ 


none 


time zone 


UID 


none 


globally unique id 


URL 


none 


URL of home page 


VERSION 


none 


version of Vcard 


The notification 


address shown 


above can be used to 



send a user urgent or timely information (i.e., sending 
information to a pager). The format of the notification 
address is preferably of an email address provided by 
35 the user when his or her subscription is activated. The 
user's notification address would be stored a variable 
named "n_addr". The application could then use this 
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email address to send a message to the user. To 

retrieve the notification address from the voice 

browser, the PROFILE input can be used in a markup 

language document in the following manner: 

5 <INPUT TYPE=" profile" NAME="n_addr " 

PROFNAME= " email " SUBTYPE= " notification " /> 

The RECORD input of the INPUT attribute of the 
markup language (i.e., <INPUT TYPE=" RECORD" 

1 0 TIMEOUT= " value " STORAGE^ " value " [ FORMAT= " value " ] 

[ NAME= " value " ] NEXT=" value" [ NEXTMETHOD= " value " ] />) is 
used to record an audio sample and to store that audio 
sample in a specified location. The RECORD input 
includes a TIMEOUT attribute, a FORMAT attribute, a NAME 

15 attribute, a STORAGE attribute, a NEXT attribute, and a 
NEXTMETHOD attribute. The value of the TIMEOUT 
attribute can be the maximum record time represented in 
milliseconds, the value of the FORMAT attribute can be a 
recorded audio format (audio/wav), the value of the NAME 

20 attribute can be an identifier, the value of the STORAGE 
attribute can be a file and a request, the value of the 
NEXT attribute can be a next step address (i.e., a URL), 
and the value of the NEXTMETHOD attribute can be a get, 
post and put. 

25 The following two examples illustrate the RECORD 

input in a markup language document. 

<STEP NAME= ,, init"> 

<PROMPT> Please say your first and last name. 
</PROMPT> 

30 <INPUT TYPE- "record" TIMEOUT=" 7000 " 

NAME- " theName " STORAGE = " REQUEST " 
NEXT= " http : / /wavhos t / acceptwav . asp " 
NEXTMETHOD= " POST 11 /> 

</STEP> 

35 



In the example shown above, the RECORD input is 
used to record a seven second audio sample, and then 
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"POST" that sample to the remote machine named 

"wavhost". The response to the "POST" has to be a 

dialog which continues the execution of the application. 

<STEP NAME="init"> 
5 <PROMPT> Please say your first and last name. 

</PROMPT> 
<INPUT TYPE= " record " TIMEOUT= "7000" 
NAME = " t heName " STORAGE = "FILE " 
NEXT= " #reccomplete " NEXTMETHOD= " GET " /> 

10 </STEP> 

In the example shown above , the RECORD input is 
used to record another seven second audio sample. 
However, the sample is stored in a file, instead of sent 

15 in the HTTP request as it was in the previous example. 
The name of the file is chosen by the voice browser 
automatically and is stored in a variable named 
"theName". After storing the audio sample in the file, 
the voice browser will continue execution at the URL 

20 specified by the NEXT attribute. In contrast to the 
previous example, the value of the variable "theName" 
will be the name of the audio file. In the earlier 
example (where the audio sample was transmitted via the 
HTTP request), the value of the variable "theName" would 

25 be null. 

The TIME input type of the INPUT attriute of the 
markup language (i.e., <INPUT TYPE= "TIME " NAME=" value " 
NEXT=" value" [NEXTMETHOD=" value" ] [ TIMEOUT=" value " ] />) 
is used to collect a time of day from the user. The 

30 TIME input includes a NAME attribute, a NEXT attribute, 
a NEXTMETHOD attribute, and a TIMEOUT attribute. The 
value of the NAME attribute can be an identifier, and 
the value of the NEXT attribute can be a next step 
address (i.e., a URL). The value of the NEXTMETHOD 

35 attribute can be a get and a post, and the value of the 



WO 00/05643 



-51- 



PCT/US99/16777 



TIMEOUT attribute can be a number represented in 
milliseconds . 

The TIME input makes use of an input grammar to 
interpret the user's response and to store that response 
5 in a standard format. This grammar will interpret 

responses of various forms, including both 12-hour and 
24-hour conventions. "Four oh three PM" becomes "403P". 
Note that "P" is appended to the time. Likewise, "Ten 
fifteen in the morning"' becomes "1015A". "Noon" is 

10 stored as "1200P", and "Midnight" is stored as "1200A". 
Military time, such as, "Thirteen hundred hours" becomes 
"100P M . If the user does not specify the morning or 
evening, no indication is stored in the variable 
(i.e., "Four o'clock" is stored as "400"). 

15 The following example illustrates the TIME input in 

a markup language document. 

<STEP NAME="init"> 

<PROMPT> What time would you like your wakeup 
call? </PROMPT> 
20 <INPUT TYPE= M time n NAME=" wakeup" 

NEXT= "#record " /> 
</STEP> 

In the example shown above, the TIME input is used 
25 to collect a time of day from the user, store that data 
in the variable named "wakeup", and then go to the STEP 
element named "record". 

The YORN input of the INPUT attribute of the markup 
language (i.e, <INPUT TYPE=" YORN " NAME= " value " 
30 [ TIMEOUT= " value " ] NEXT= " value " [ NE XTME THOD= " value J /> , 
or <INPUT TYPE=" YORN" [ NAME= "value " ] [TIMEOUT=" value" ] 
[NEXT="value" [NEXTMETHOD=" value " ] ] > CASE elements 
</INPUT>) is used to collect "yes" or "no" responses 
from the user. The YORN input includes a NAME 
35 attribute, a NEXT attribute, a NEXTMETHOD attribute, and 
a TIMEOUT attribute. The value of the NAME attribute 
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can be an identifier, and the value of the NEXT 
attribute can be a next step address (i.e., a URL). The 
value of the NEXTMETHOD attribute can be a get and a 
post, and the value of the TIMEOUT attribute can be a 
5 number represented in milliseconds. 

The YORN input maps a variety of affirmative and 
negative responses to the values "Y" and "N". The YORN 
input stores the value "Y" for affirmative responses and 
the value "N" for negative responses. Affirmative and 
10 negative responses are determined using an input grammar 
that maps various user responses to the appropriate 
result . 

The following example illustrates the user of the 

YORN input in a markup language document. 

15 <STEP NAME= " ask"> 

<PROMPT> Fire the missies now? </PROMPT> 
< INPUT TYPE= " YORN " NAME= " f ire " 
NEXT= "#confirm"/> 
</STEP> 

20 

In the example shown above, the YORN input is used 
to collect a "yes" or "no'' response from the user, store 
that response into a variable named "fire", and then go 
to the STEP named "confirm". 

25 The OPTION element of the markup language (i.e. 

<OPTION [NEXT= n value" [NEXTMETHOD=" value " ] ] 
[ VALUE=" value " ] > text </OPTION>) is used to define the 
type of response expected from the user in a STEP 
element or state. The OPTION input includes a VALUE 

30 attribute, a NEXT attribute, and a NEXTMETHOD attribute. 
The value of the VALUE attribute can be a literal value, 
the value of the NEXT attribute can be a next step 
address (i.e., a URL), and the value of the NEXTMETHOD 
attribute can be a get and a post. The OPTION element 

35 can exist within the INPUT element, and then only when 
using the OPTIONLIST input. 
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The following two examples illustrate the use of 

the OPTION element in a markup language document. 

<INPUT NAME= " choice 11 TYPE="optionlist "> 

<OPTION NEXT="#doit n VALUE="1"> one </OPTION> 
5 <OPTION NEXT="#doit" VALUE="2"> two </OPTION> 

</lNPUT> 

The example shown above illustrates the use of the 

OPTION element within the INPUT element. In this 

example, the first OPTION element would be executed when 

the user responded with "one", and the second OPTION 

would be executed when the user responded with "two". 

If the user said "one", the value of the variable named 

"choice" would be "1", because of the use of the VALUE 

attribute. Because the NEXT attributes for both of the 

OPTION element in this OPTIONLIST element are the same, 

the voice browser would proceed to the STEP element 

named "doit" when either "one" or "two" was recognized. 

<INPUT TUPE="optionlist M > 
<OPTION 

NEXT="http: //localhost /vml/weather .asp"> 

weather </OPTION> 
<OPTION NEXT= " http : / / loc alhost / vml /news . asp " > 

news </OPTION> 
<OPTION 

NEXT= " http : / /localhost /vml/traffic . asp"> 
traffic </OPTION> 

</INPUT> 

30 The example shown above illustrates the use of the 

OPTION element to select one of three applications. 
Note that the URLs used in the NEXT attributes are full 
HTTP URLs, and that unlike the previous example, each 
OPTION element has a unique NEXT attribute. 

35 The OPTIONS element of the markup language (i.e., 

<OPTIONS/>) describes the type of input expected within 
a given STEP element. The OPTIONS element can be used 
in HELP elements to present the user with a complete 
list of valid responses. The OPTIONS element can be used 



15 



20 
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anywhere that text is read to the user. The OPTIONS 

element can be contained by a PROMPT , EMP, PROS, HELP, 

ERROR, or ACK element. 

The following example illustrates the use of the 

5 OPTIONS element in a markup language document. 

<CLASS NAME = " helpful ,! > 

<HELP> Your choices are: <OPTIONS/> </HELP> 
</CLASS> 

10 The example shown above illustrates how the OPTIONS 

element can be used to construct a "helpful" class. Any 
STEP elements that directly or indirectly name "helpful" 
as a PARENT element respond to a helpful request (i.e., 
"help") by speaking the message, in which the OPTIONS 

15 element expands to a description of what can be said by 
the user at this point in the dialog. 

The ACK element of the markup language (i.e., <ACK 
[ CONFIRM^ " value " ] [ BACKGROUND= " value " ] 

[ REPROMPT= " value " ] > text </ACK>) is used to acknowledge 

20 the transition between Step elements, usually as a 

result of a user response. The ACK element includes a 

CONFIRM attribute, a BACKGROUND attribute, and a 

REPROMPT attribute. The value of the BACKGROUND and 

REPROMPT attributes can be a "Y" and "N" , and the 

25 CONFIRM attribute can be a YORN element as described 

above. The ACK element can be contained within a STEP 

element or a CLASS element as further described below. 

The following is an example of a markup language 

document containing the Ack element. 

30 <STEP NAME= n card_type M > 

<PROMPT> 

What type of credit card do you have? 
</PROMPT> 

<INPUT NAME= M type " TYPE="optionlist"> 
35 <OPTION NEXT="#exp"> visa </OPTION> 

<OPTION NEXT="#exp"> mastercard 

</OPTION> 

<OPTION NEXT= u #exp"> discover </OPTION> 
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</lNPUT> 

<ACK CONFIRM= " YORN " REPROMPT= " Y " > 

I thought you said <VALUE NAME= M type" /> 
<BREAK/> Is that correct? 
5 </ACK> 
</STEP> 

In the example above, the ACK element is used to 
confirm the user's choice of credit card. When this 

10 element is interpreted by the voice browser, the PROMPT 
element is read to the user using text-to-speech unit 
252. The system waits until the user responds with 
"visa", "mastercard", or "discover" and then asks the 
user to confirm that the type of card was recognized 

15 correctly. If the user answers "yes" to the ACK 

element, the voice browser will proceed to the STEP 
element named "exp". if the user answers "no" to the 
ACK element, the text of the PROMPT element will be read 
again, and the user will be allowed to make his or her 

20 choice again. The voice browser then re-enters or 
executes the STEP element again. 

The AUDIO element of the markup language (i.e., 
<AUDIO SRC=" value" />) specifies an audio file that 
should be played. The AUDIO element includes a SRC 

25 attribute. The value of the SRC attribute can be an 
audio file URL. The AUDIO element can be contained 
within a PROMPT, EMP , PROS, HELP, ERROR, CANCEL, or ACK 
element . 

The following markup language contains the AUDIO 
30 element. 

<PROMPT> 

At the tone, the time will be 11:59 p m 
< AUDIO 

SRC="http: //localhost/sounds/beep.wav" /> 
35 </PROMPT> 



In the example above, the AUDIO element is included 
in a PROMPT element. When interpreted by the voice 
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browser, a prompt (i.e., "At the tone, the time will be 
11:59 pm. n ) will be played to the user, and the WAV file 
"beep.wav" will be played to the user as specified by 
the AUDIO element. 
5 The BREAK element of the markup language (i.e., 

< BREAK [MSECS= M value M | SIZE="value" ] /> ) is used to 
insert a pause into content or information to be played 
to the user. The BREAK element includes a MSEC 
attribute and a SIZE attribute. The value of the MSEC 

10 attribute can include a number represented in 

milliseconds, and the value of the SIZE attribute can be 
none, small, medium, and large. 

The BREAK element can be used when text or audio 
sample is to be played to the user. The BREAK element 

15 can be contained within a PROMPT, EMP, PROS, HELP, 

ERROR, CANCEL, or ACK element. The following markup 
language contains the BREAK element. 
<PROMPT> 

Welcome to Earth. < BREAK MSECS="2 50 " /> 
20 How may I help you? 

</PROMPT> 

In the example above, the BREAK element is used 
with a MSECS attribute, inside a PROMPT element. When 
25 interpreted by the voice browser, a prompt (i.e, 

"Welcome to Earth.") is read to the user. The system 
will then pause for 250 milliseconds, and play "How may 
I help you?". 

Alternatively, the SIZE attribute (i.e., "small", 
30 "medium", and "large" ) of the BREAK element can be used 
to control the duration of the pause instead of 
specifying the number of milliseconds as shown below. 
<PROMPT> 

Welcome to Earth. < BREAK S I ZE= "medium" /> 
35 How may I help you? 

</PROMPT> 
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The OR element of the markup language (i.e., <OR/>) 
is used to define alternate recognition results in an 
OPTION element. The OR element is interpreted as a 
logical OR, and is used to associate multiple 
5 recognition results with a single NEXT attribute. 

The following example illustrates the use of the OR 

element in a markup language document. 

<INPUT TYPE="optionlist"> 
<OPTION NEXT="#coke_chosen"> 
10 coke <OR/> coca-cola 

</OPTION> 

<OPTION NEXT="#pepsi_chosen"> pepsi </OPTION> 
</lNPUT> 

15 The example shown above illustrates the use of the 

OR element within an OPTION element. As shown above, 
the user may respond with either "coke" or "coca-cola", 
and the voice browser will proceed to the STEP named 
"coke_chosen" . 

20 The CANCEL element of the markup language (i.e., 

<CANCEL NEXT= n value" [NEXTMETHOD= "value " ] /> or <CANCEL 
NEXT="value" [ NEXTMETHOD= "value " ] > text </CANCEL>) is 
used to define the behavior of the application in 
response to a user's request to cancel the current 

25 PROMPT element. The CANCEL element includes a NEXT 
attribute and a NEXTMETHOD attribute. The value the 
NEXT attribute can be a next step address (i.e., a URL), 
and the value of the NEXTMETHOD attribute can be a get 
and a post. 

30 The CANCEL element can be invoked through a variety 

of phrases. For example, the user may say only the word 
"cancel", or the user may say "I would like to cancel, 
please." The CANCEL element can be contained within a 
STEP element or a CLASS element. When the voice browser 

35 detects "cancel" from the user, the voice browser 

responds based upon the use of the CANCEL element in 
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markup language document. If no CANCEL element is 

associated with a given STEP element, the current prompt 

will be interrupted (if it is playing) and will stay in 

the same application state and then process any 

5 interactive inputs. 

The following example illustrates a markup language 

containing the CANCEL element. 

<STEP NAME= " report " > 

<CANCEL NEXT="#traf f ic_menu"/> 
10 <PROMPT> Traffic conditions for Chicago, 

Illinois , 

Monday, May 18. Heavy 
congestion on ... </PROMPT> 
INPUT TYPE= M optionlist n > 
15 <OPTION NEXT="#report "> repeat </OPTION> 

<OPTION NEXT="#choose"> new city 

</OPTION> 

</lNPUT> 
</STEP> 

20 

The example above illustrates the use of the CANCEL 
element to specify that when the user says "cancel", the 
voice browser proceeds to the STEP element named 
"traf f icjnenu", instead of the default behavior, which 

25 would be to stop the PROMPT element from playing and 
wait for a user response. The user can also interrupt 
the PROMPT element by speaking a valid OPTION element. 
In this example, the user could interrupt the PROMPT 
element and get the traffic conditions for a different 

30 city by saying "new city". 

The CASE element of the markup language (i.e., 
<CASE VALUE= " value " NEXT= " value " [NEXTMETHOD= M value" ] /> 
) is used to define the flow of control of the 
application, based on the values of internal markup 

35 language variables. The CASE input includes a VALUE 

attribute, a NEXT attribute, and a NEXTMETHOD attribute. 
The value of the VALUE attribute can be a literal value, 
the value of the NEXT attribute can be a next step 
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address (i.e. a URL), and the value of the NEXTMETHOD 

attribute can be a get and a post. The CASE element can 

be contained by a SWITCH element or an INPUT element, 

when using an input type of the INPUT element that 

5 collects a single value (i.e., DATE, DIGITS, MONEY, 

PHONE, TIME, YORN). 

The following example illustrates a markup language 

containing a CASE element. 

<SWITCH FILED = "pizza n > 
10 <CASE VALUE="pepperoni" NEXT="#p_pizza" /> 

<CASE VALUE=" sausage" NEXT="#s_pizza" /> 
<CASE VALUE= " veggie " NEXT= n #v_pizza" /> 
</SWITCH> 

15 l n the example above, the markup language shows the 

use of the CASE element within the SWITCH element. In 
this example, the CASE elements are used to direct the 
voice browser to different URLs based on the value of 
the markup language variable "pizza". 

20 The CLASS element of the markup language (i.e., 

<CLASS NAME= " value " [ PARENT= " value " ] [ BARGE IN= "value " ] 
[COST="value" ] > text </CLASS>) is used to define a set 
of elements that are to be reused within the content of 
a dialog. For example, application developers can 

25 define a set of elements once, and then use them several 
times. The CLASS input includes a NAME attribute, a 
PARENT attribute, a BARGEIN attribute, and a COST 
attribute. The value of the NAME and the PARENT 
attribute can be an identifier. The value of the 

30 BARGEIN attribute can be "Y"and "N" , and the value of 
the COST attribute can be an integer number. 

The CLASS element can be used to define the default 
behavior of an ERROR element, a HELP element, and a 
CANCEL element, within a given DIALOG element. The CLASS 

35 element can be contained by a DIALOG element. The 
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following example shows a markup language document 

containing the CLASS element. 

<CLASS NAME=" simple "> 

<HELP> Your choices are <OPTIONS/> </HELP> 
5 <ERROR> I did not understand what you said. 

Valid responses are <OPTIONS/> </ERROR> 

</CLASS> 

<STEP NAME= "beverage " PARENT= " simple "> 
10 <PROMPT> Please choose a drink. </PROMPT> 

<INPUT NAME= " drink " TYPE= n optionlist "> 

<OPTION NEXT="#f ood M > coke </OPTION> 
<OPTION NEXT= M #food"> pepsi </OPTION> 
</lNPUT> 
15 </STEP> 

<STEP NAME= " food " PARENT= " simple " > 
<PROMPT> Please choose a meal. </PROMPT> 
<INPUT NAME= "meal " TYPE="optionlist "> 
20 <OPTION NEXT="#de liver M > pizza </OPTION> 

<OPTION NEXT="#deliver"> tacos </OPTION> 
</INPUT> 
</STEP> 

25 In the example above, the markup language document 

illustrates the use of the CLASS element to define a 
HELP element and an ERROR element that will be used in 
several steps within this DIALOG element. The markup 
language also illustrates the use of the PARENT 

30 attribute in the STEP element to refer to the CLASS 
element, and therefore inherit the behaviors defined 
within it. When interpreted by the voice browser, the 
STEP element will behave as if the HELP and ERROR 
elements that are defined in the CLASS element were 

35 defined explicitly in the steps themselves 

The EMP element of the markup language (i.e., <EMP 
[ LEVEL=" value " ] > text </EMP>) is used to identify 
content within text that will be read to the user where 
emphasis is to be applied. The EMP element includes a 

40 LEVEL attribute. The value of the LEVEL element can be 
none, reduced, moderate, and strong. The EMP element 
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can be contained within a PROMPT , EMP, PROS, HELP, 

ERROR, CANCEL, or ACK element. The following example of 

a markup language document contains the EMP element. 

<PROMPT> 
5 This example is 

<EMP LEVEL=" strong M > really </EMP> 

simple . 
</PROMPT> 

10 In the above example, the EMP element is used to 

apply "strong" emphasis to the word "really" in the 
PROMPT element. The actual effect on the speech output 
is determined by the text-to-speech (TTS) software of 
the system. To achieve a specific emphatic effect, the 

15 PROS element, as further described below, can be used 
instead of the EMP element. 

The ERROR element of the markup language (i.e., 
< ERROR [ TYPE= " value " ] [ ORDINAL= " value " ] 

[ REPROMPT= " value " ] [ NEXT= " value " [ NEXTMETHOD= " value " ] ] 

20 > text </ERROR>) is used to define the behavior of the 
application in response to an error. The ERROR element 
includes a TYPE attribute, an ORDINAL attribute, a 
REPROMPT attribute, a NEXT attribute, and a NEXTMETHOD 
attribute. The value of the TYPE attribute can be all, 

25 nomatch, nospeech, toolittle, toomuch, noauth, and 

badnext. The value of the ORDINAL attribute can be an 
integer number, the value of the REPROMPT attribute can 
be "Y" or "N", the value of the NEXT attribute can be a 
next step address (i.e., a URL), and the value of the 

30 NEXTMETHOD attribute can be a get and a post. 

If the application developer does not define the 
behavior of an ERROR element for a given STEP element, 
the default behavior will be used. The default behavior 
for the ERROR element is to play the phrase "An error 

35 has occurred.", remain in the current STEP element, 
replay the PROMPT element, and wait for the user to 
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respond. The ERROR element can be contained within a 
STEP or a CLASS element. 

The following example illustrates the use of the 
ERROR element in a markup language document. 

5 1 <STEP NAME="errors"> 

2 < ERROR TYPE="nomatch"> First error message. 

3 I did not understand what you said. 
</HELP> 

4 <ERROR TYPE=" nomatch" ORDINAL=" 2 '*> 
10 5 Second error message. 

6 I did not understand what you said. 
</HELP> 

7 <PROMPT> This step tests error messages. 

8 Say 'oops' twice. Then say 'done' to 
15 9 choose another test. </PROMPT> 

10 <INPUT TYPE="OPTIONLIST"> 

11 <OPTION NEXT="#end"> done </OPTION> 

12 </lNPUT> 

13 </STEP> 

20 

In the example above, the ERROR element is used to 
define the application's behavior in response to an 
error. On line 2, the error message is defined to be 
used the first time an error of type "nomatch" occurs in 

25 this STEP element. On line 4, the error message is to 
be used the second and all subsequent times an error of 
type "nomatch" occurs in this STEP. 

The ORDINAL attribute of the ERROR element of the 
markup language determines which message will be used in 

30 the case of repeated errors within the same STEP 

element. The voice browser can choose an error message 
based on the following algorithm. If the error has 
occurred three times, the voice browser will look for an 
ERROR element with an ORDINAL attribute of "3". If no 

35 such ERROR element has been defined, the voice browser 

will look for an ERROR element with an ORDINAL attribute 
of "2", and then "1", and then an ERROR element with no 
ORDINAL attribute defined. Thus, if the ERROR element 
is defined with the ORDINAL attribute of "6" in the STEP 
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element shown above, and the same error occurred six 
times in a row, the user would hear the first error 
message one time, then the second error message four 
times, and finally the error message with ORDINAL 
5 attribute of "6". 

The HELP element of the markup language (i.e.,<HELP 
[ ORDINAL= " value " ] [ REPROMPT= " value " ] [ NEXT= " value " 
[ NEXTMETHOD-" value " ] ] > text </HELP>) is used to define 
the behavior of the application when the user asks for 

10 help. The HELP element includes an ORDINAL attribute, a 
REPROMPT attribute, a NEXT attribute, and a NEXTMETHOD 
attribute. The value of the ORDINAL attribute can be an 
integer number, and the value of the REPROMPT attribute 
can be a "Y" and "N" . The value of the NEXT attribute 

15 can be a next step address (i.e., a URL), and the value 
of the NEXTMETHOD attribute can be a get and a post. 

The HELP element, like CANCEL the element, can be 
detected through a variety of phrases. The user may say 
only the word "help", or the user may say "I would like 

20 help, please." In either case, the HELP element will be 
interpreted. The HELP element can be contained within a 
STEP element or a CLASS element. 

When the voice browser detects "help" from the 
user, the voice browser responds based upon the use of 

25 the HELP element in markup language document. If no 
HELP element is associated with a given STEP, the 
current prompt will be interrupted (if it is playing), 
the user will hear "No help is available.", and will 
stay in the same application state and process any 

30 interactive inputs. 

The following example illustrates the use of the 
HELP element in a markup language document. 

1 <STEP NAME — " helps " > 

2 <HELP REPROMPT- M Y"> First help message. 

35 3 You should hear the prompt again. </HELP> 
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4 <HELP ORDINAL= M 2 M > Second help message. 

5 You should not hear the prompt now. 
</HELP> 

6 <PROMPT> This step tests help prompts. 

5 7 Say 'help' twice. Then say 'done 1 to 

8 choose another test. </PROMPT> 

9 <INPUT TYPE= n OPTIONLIST n > 

10 <OPTION NEXT="#end"> done </OPTION> 

11 </!NPUT> 



10 12 </STEP> 

In the example above, the HELP element is used to 
define the application's behavior in response to the 
user input "help". On line 2, the help message is 

15 defined to be used the first time the user says "help". 
On line 4, the help message is defined to be used the 
second and all subsequent times the user says "help". 
It should also be noted that through the use of the 
REPROMPT attribute, the prompt will be repeated after 

20 the first help message, but it will not be repeated 
after the second help message. 

The ORDINAL attribute of the HELP element of the 
markup language determines which message will be used in 
the case of repeated utterances of "help" within the 

25 same STEP element. The voice browser will choose a help 
message based on the following algorithm. If the user 
has said "help" three times, the voice browser will look 
for a HELP element with an ORDINAL attribute of "3". 
If no such HELP element has been defined, the voice 

30 browser will look for a HELP element with an ORDINAL 

attribute of "2", and then "1", and then a HELP element 
with no ORDINAL attribute defined. Thus, if a HELP 
element is defined with ORDINAL attribute of "6" in the 
STEP element shown above, and the user said "help" six 

35 times in a row, the user would hear the first help 
message one time, then the second help message four 
times, and finally the help message with ORDINAL 
attribute of "6". 
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The PROS element of the markup language (i.e., 
<PROS [ RATE= " value " ] [ VOL= " value " ] [ PITCH= " value " ] 
[ RANGE= " value " ] > text </PROS>) is used to control the 
prosody of the content presented to the user via PROMPT, 
5 HELP, ERROR, CANCEL, and ACK elements. Prosody affects 
certain qualities of the text-to-speech presentation, 
including rate of speech, pitch, range, and volume. The 
PROS element includes a RATE attribute, a VOL attribute, 
a PITCH attribute, and a RANGE attribute. The value of 

10 the RATE attribute can be an integer number representing 
words per minute, and the value of the VOL attribute can 
be an integer number representing volume of speech. The 
value of the PITCH attribute can be an integer number 
representing pitch in hertz, and the value of the RANGE 

15 attribute can be an integer number representing range in 

hertz. The PROS element can be contained within a 

PROMPT, EMP, PROS, HELP, ERROR, CANCEL, or ACK element. 

The following example illustrates the use of the 

pros element. 

20 <PROMPT> Let me tell you a secret: 

<PROS VOL="0.5 n > I ate the apple. </PROS> 
</PROMPT> 

In the example shown above, the phrase "I ate the 
25 apple" is spoken with one half of the normal volume. 

The RENAME element of the markup language (i.e., 
< RE NAME RECNAME= M value" VARNAME= " value " />) is used to 
rename recognition slots in grammars, such that the 
resulting variable name can be different from the name 
30 of the recognition slot defined in the grammar. The 
rename element includes a VARNAME attribute and a 
RECNAME attribute. The value of the VARNAME and the 
RECNAME attributes can be identifiers. The RENAME 
element can exist only within the INPUT element, and 
35 then only when using the GRAMMAR input type. 
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The following example illustrates the use of the 

RENAME element in a markup language document. 

<INPUT TYPE= 11 GRAMMAR " 

SRC="http: //www. foo.com/mygram.grm" 

5 

NEXT= " http: //www. fancyquotes.com/vmlstocks . asp"> 
< RE NAME VARNAME="sym" RE C NAME = " symbol "> 
< RE NAME VARNAME=" detail " RECNAME= " quotetype " > 
</INPUT> 

10 

In the example shown above, the RENAME element is 
used to account for differences in the variable names 
collected from a grammar and those expected by another 
script. In particular, a grammar from foo.com is used 

15 to provide input to an application hosted by 

fancyquotes.com. Because, in this example, the grammar 
and script have been developed independently, the RENAME 
element is used to help connect the grammar and the 
stock-quoting application . 

20 The RESPONSE element of the markup language (i.e., 

<RESF0NSE FIELDS=" value" [NEXT=" value " 
[ NEXTMETHOD=" value M ] ] />or <RESP0NSE FIELDS=" value" 
[NEXT="value" [ NEXTMETHOD^" value " ] ] > SWITCH elements 
</RESP0NSE>) is used to define the behavior of an 

25 application in response to different combinations of 
recognition slots. The response element includes a 
FIELDS attribute, a NEXT attribute, and a NEXTMETHOD 
attribute. The value of the FIELDS attribute can be a 
list of identifiers, the value of the NEXT attribute can 

30 be a next step address (i.e., a URL), and the value of 
the NEXTMETHOD attribute can be a get and a post. 

The RESPONSE element enables application developers 
to define a different NEXT attribute depending on which 
of the grammar's slots were filled. The RESPONSE 

35 element can exist within an INPUT element, and then only 
when using an input type of grammar. 
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The following example illustrates the RESPONSE 
element in a markup language document. 
<INPUT TYPE=" GRAMMAR" 

SRC=*'gram: // . Banking/action/amt/f romacct/toacct" 
NEXT= "#notenoughf ields " > 

<RESPONSE FIELDS= " action , amt , f romacct , toacct " 

NEXT= "#doit"/> 
<RESPONSE FIELDS= " action , amt , f romacct " 

NEXT= " #asktoacct " /> 
<RESPONSE FIELDS^ " action , amt , toacct " 

NEXT= n #askf romacct "/> 
<RESPONSE FIELDS=" action, amt" 
NEXT= M #askaccts " /> 

<RESPONSE FIELDS=" action" 
NEXT= "#askamtaccts " /> 
</INPUT> 

The example shown above illustrates the use of the 
20 RESPONSE element where the user specifies less than all 
the possible variables available in the grammar. Using 
the RESPONSE element, the application can arrange to 
collect the information not already filled in by prior 
steps. In particular, this example transfers to the 
25 "askaccts" STEP element if neither the source nor 

destination account is specified (i.e., the user said 
"transfer 500 dollars' 7 ), but it transfers to the 
"askfromacct" STEP element if the user said what account 
to transfer to, but did not specify a source account 
30 (i.e., if the user had said "transfer 100 dollars to 
savings"). The next URL of the INPUT element is used 
when the user's response does not match any of the 
defined responses • 

The SWITCH element of the markup language (i.e., 
35 <SWITCH FIELD=" value "> vml </SWITCH>) is used to define 
the application behavior dependant on the value of a 
specified recognition slot. The switch element includes 
a FIELD attribute. The value of the FIELD attribute can 
be an identifier. The SWITCH element is used in 



10 
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conjunction with the CASE element. The SWITCH element 
can exist within the INPUT element, and then only when 
using the grammar input type. 

The following example illustrates the use of the 
5 SWITCH element in a markup language document. 
<INPUT TYPE=" GRAMMAR" 

SRC="gram: // . Banking/action/amount/f romacct/toacct "> 
<SWITCH FIELD="action"> 
10 <CASE VALUE= n transfer" NEXT="#transf er " /> 

<CASE VALUE=" balance" NEXT="#balance " /> 
<CASE VALUE="activity"> 

<SWITCH FIELD= " fromacct " > 

<CASE VALUE=" checking" NEXT="#chxact " /> 
15 <CASE VALUE= M savings" NEXT="#savact " /> 

</SWITCH> 
</CASE> 
</SWITCH> 
</lNPUT> 

20 

In the example shown above, the SWITCH element is 
used to determine the next STEP element to execute in 
response to a banking request. In this example, the 
grammar may fill in some or all of the variables (i.e., 

25 "action", "amount", "fromacct", and "toacct"). If the 
user asks for a transfer or balance action, the next 
STEP element to execute is the transfer or balance step. 
If the user asks for a report of account activity, a 
second SWITCH element determines the next STEP element 

30 based on the account type for which a report is being 
requested (assumed to be available in the "fromacct" 
variable ) . 

The VALUE element of the markup language (i.e., 
<VALUE NAME=" value" />) is used to present the value of 
35 a variable to the user via the text-to-speech unit. The 
VALUE element includes a FIELD attribute. The value of 
the FIELD attribute can be an identifier. The VALUE 
element can be used anywhere that text is read to the 
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user. The VALUE element can be contained by a PROMPT, 

EMP , PROS, HELP, ERROR, CANCEL, or ACK element. 

The following example illustrates the use of the 

value element in a markup language document. 

5 <STEP NAME= " thanks " > 

<PROMPT> Thanks for your responses. I'll 

record 

that <VALUE NAME=" first " /> is your 

favorite 

10 and that <VALUE NAME=" second" /> is your 

second choice. 
</PROMPT> 

<INPUT TYPE= " NONE " NEXT= " / recordresult s . asp " 

/> 

15 </STEP> 

The example shown above illustrates the use of the 
VALUE element to read the user's selections back to the 
user. As shown above, the value of the variable named 

20 " first " would be inserted into the PROMPT element, and 
the value of the variable named "second" would be 
inserted into the PROMPT element. 

The COST attribute of the STEP element of the 
markup language enables is used to charge a user for 

25 various services. The COST attribute can be used in the 
definition of one of more STEP or CLASS elements. The 
value of the COST attribute is the integer number of 
credits the user is to be charged for viewing the 
content. For example, to charge 10 credits for 

30 listening to a particular step element a provider might 

write the following markup language: 

<STEP NAME^'premiumContent" COST=" 10 H > 
... premium content goes here ... 
</STEP> 

If a content provider wishes to maintain a record 
of subscriber charges, the content provider need only 
request identifying data for the user using the PROFILE 
input type as in: 



35 
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<INPUT TYPE= " PROFILE " PROFNAME="UID " 
NAME= " subID " /> 

Using the resulting value and examining the 
5 S UB_C H ARGE query-string parameter at each page request, 
the content provider can maintain records on a per- 
subscriber basis. 

The following text describes a weather application 
500 that can be executed by the system 200 of FIG. 3. 

10 FIG. 8 shows an exemplary state diagram of the weather 
application containing states that prompt the user for 
input in order to access the weather database. After 
speaking the current or forecast weather information, 
the application expects the user to say a city name or 

15 the word "exit" to return to the main welcome prompt. 
The user can select to hear the forecast after the 
current weather conditions prompt. It will be 
recognized that the application could be designed to 
address errors, help and cancel requests properly. 

20 The markup language set forth below is a static 

version of the weather application. The initial state 
or welcome prompt is within the first step, init (lines 
11-20). The user can respond with a choice of 
"weather", "market", "news" or "exit". Once the 

25 application detects the user's response of "weather", 
the next step, weather (lines 21-29), begins. The 
prompt queries the user for a city name. Valid choices 
are "London", "New York", and "Chicago". 

The steps called london_current , london_f orecast , 

30 newyork_current, newyork_f orecast , chicago_current , and 
chicago_forecast provide weather information prompts for 
each city. It is noted that Market and news steps are 
just placeholders in the example (lines 111 and 115). 
<?XML VERSION="1.0"?> 
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<! — 



— > 




<! — 


(c) 1998 Motorola Inc. 


— > 




<! — 


weather . vml 


— > 




<! — 





10 — > 

<DIALOG> 

<CLASS NAME= " help top " > 

<HELP>You are at the top level menu. For 
weather information , 
15 say weather. </HELP> 

</CLASS> 

<STEP NAME="init" PARENT= " he Ip top " > 

<PROMPT>Welcome to Genie. <BREAK SIZE= " large "/> 
How may I help you? </PROMPT> 
20 <INPUT TYPE="OPTIONLIST"> 

<OPTION NEXT="#weather">weather</OPTION> 
<OPTION NEXT= " #mar ket ">market< / OPTION> 
<OPTION NEXT="#news">news</OPTION> 
<OPTION NEXT="#bye">exit</OPTION> 
25 </INPUT> 
</STEP> 

<STEP NAME = "weather" PARENT="help_top"> 
<PROMPT>What city? </PROMPT> 
<INPUT TYPE="OPTIONLIST"> 
30 <OPTION 

NEXT="#london_current ">london</OPTION> 

<OPTION NEXT= " #newyork__current " >new 

york</OPTION> 

<OPTION 

35 NEXT="#chicago_current">chicago</OPTION> 

<OPTION NEXT="#init">exit</OPTION> 
</INPUT> 
</STEP> 

<CLASS NAME="help_generic"> 
40 <HELP>Your choices are <OPTIONS/> . </HELP> 

</CLASS> 

<STEP NAME ="london_cur rent" PARENT="help_jgeneric"> 
<PROMPT>It is currently 4 6 degrees in London, 

with rain. 
45 < BREAK SIZE= " large " /> 

To hear the 3 day forecast for London, say 
forecast, or say 

another city name, such as Chicago or New 
York.</PROMPT> 
50 <INPUT TYPE="OPTIONLIST"> 
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<OPTION 

NEXT= "#london forecast M >f orecast< / OPTION> 

<OPTION 

NEXT= " #london_cur rent " > london< /OPTION> 
5 <OPTION NEXT= n #newyork_current">new 

york</OPTION> 

<OPTION 

NEXT="#chicago_current M >chicago</OPTION> 

<OPTION NEXT="#init">exit</OPTION> 
10 </lNPUT> 
</STEP> 

<STEP NAME="london__f ©recast" PARENT="help_generic "> 
<PROMPT>London forecast for 
Tuesday. Showers. High of 50. Low of 44. 
15 Wednesday. Partly cloudy. High of 39. Low of 

35. 

<BREAK SIZE=" large " /> 

Choose a city, or say exit to return to the 
main menu . </PROMPT> 
20 <INPUT TYPE="OPTIONLIST"> 

< OPT I ON 

NEXT= "#london_current " > london< /OPTION> 

<OPTION NEXT= "#newyork_current " >new 

york</OPTION> 
25 <OPTION 

NEXT="#chicago_current ">chicago</OPTION> 

<OPTION NEXT= " #init " >exit< / OPTION> 
</INPUT> 
</STEP> 

30 <STEP NAME="chicago_current" PARENT="help__generic "> 

<PROMPT>It is currently 31 degrees in Chicago, 

with snow. 

< BREAK SIZE=" large" /> 

To hear the 3 day forecast for Chicago, say 
35 forecast, or say 

another city name, such as London or New 
York.</PROMPT> 

<INPUT TYPE="OPTIONLIST"> 
<OPTION 

40 NEXT="#chicago_f orecast ">f orecast</OPTION> 

<OPTION 

NEXT= "#london_current ">london</OPTION> 

<OPTION NEXT= " #newyork_cur rent M >new 

york</OPTION> 
45 <OPTION 

NEXT="#chicago_current ">chicago</OPTION> 

<OPTION NEXT= n #init " >exit< /OPTION> 
</INPUT> 
</STEP> 

50 <STEP NAME="chicago_f orecast" 

PARENT^ " help_generic " > 
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<PROMPT>Chicago forecast for 

Tuesday. Flurries. High of 27. Low of 22. 

Wednesday. Snow showers. High of 27. Low of 

12. 

5 <BREAK SIZE="large"/> 

Choose a city, or say exit to return to the 
main menu . </PROMPT> 

<INPUT TYPE="OPTIONLIST n > 
<OPTION 

10 NEXT="#london_current M >london</OPTION> 

<OPTION NEXT= "#newyork_current " >new 

york</OPTION> 

<OPTION 

NEXT= ,, #chicago_current M >chicago</OPTION> 
15 <OPTION NEXT="#init ">exit</OPTION> 

</INPUT> 
</STEP> 

<STEP NAME = M newyork^cur rent " PARENT="help_generic"> 
<PROMPT>It is currently 39 degrees in New York 

20 City, with 

cloudy skies. <BREAK SIZE= H large " /> 
To hear the 3 day forecast for New York, say 
forecast, or say 

another city name, such as London or New 
25 York.</PROMPT> 

<INPUT TYPE="OPTIONLIST"> 
<OPTION 

NEXT=*'#newyork_f orecast ">f orecast</OPTION> 

<OPTION NEXT= ,, #london_">london</OPT10N> 
30 <OPTION NEXT="#newyork">new york</OPTION> 

<OPTION NEXT= "#chicago " >chicago</OPTION> 
<OPTION NEXT= "#init '■ >exit< /0PTI0N> 
</lNPUT> 
</STEP> 

35 <STEP NAME= " newyork forecast " 

PARENT= " help_gener ic 11 > 

<PROMPT>New York City forecast for 
Tuesday. Windy. High of 48. Low of 43. 
Wednesday. Rain. High of 43. Low of 28. 
40 < BREAK SIZE= n large "/> 

Choose a city, or say exit to return to the main 
menu . </PROMPT> 

<INPUT TYPE="OPTIONLIST n > 
<OPTION 

45 NEXT= " #london_current " >london< /OPTION> 

<OPTION NEXT= n #newyork_current M >new 

york</OPTION> 

<OPTION NEXT="#chicago. ">chicago</OPTlON> 
<OPTION NEXT= ,, #init">exit</OPTION> 

50 </INPUT> 
</STEP> 
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<STEP NAME = "market " > 

<PROMPT>Market update is currently not 
supported. </PROMPT> 

<INPUT TYPE= " NONE " NEXT= "#init " /> 
5 </STEP> 

<STEP NAME= " news " > 

<PROMPT>News update is currently not 
supported . </PROMPT> 

<INPUT TYPE =,, NONE " NEXT= "#init " /> 
10 </STEP> 

<STEP NAME= " bye " PARENT="help_top"> 

<PROMPT>Thanks for using Genie. Goodbye. 

</PROMPT> 

< INPUT TYPE= "NONE " NEXT= "#exit " /> 
15 </STEP> 

</DIALOG> 

FIG. 9 illustrates the same state diagram for the 
weather application as shown in FIG. 8 with labels for 

20 each dialog boundary. The initial dialog and dialogl 
contains the user prompts for welcome and city name. 
The Dialogl also controls the prompts for transitioning 
to hear a city's current or forecast weather and 
returning to the main menu. Dialog2 handles access of 

25 the weather database for the current conditions of the 
city specified by the user and the information is read 
to the user. The Dialog2 then returns control to 
dialogl again to get the user's next request. 
Similarly, dialog3 handles access of the weather 

30 database for the forecast of the city requested and 

speaks the information. It returns control to dailogl 
to get the next user input. 

The markup language set forth below illustrates an 
example of the weather application corresponding to the 

35 dialog boundaries as presented in the state diagram of 
FIG. 9. The implementation of the application is with 
Active Server Pages using VBscript. It consists of 
three files called dialogl. asp, dialog2.asp, and 
dialog3.asp, each corresponding to the appropriate 

40 dialog. 
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For dialogl, there are two help message types, 
help_top and help_dialogl (lines 16 and 29). The first 
step, init, is at line 19. The weather step follows at 
line 32. Valid city names are those from the citylist 
5 table (line 36) of the weather database. Lines 7 and 8 
accomplish the database connection via ADO. Line 38 is 
the start of a loop for creating an option list of all 
possible city responses. If the user chooses a city, 
control goes to the step getcurrentweather in dialog2, 

10 as shown at line 40. In this case, the city name is 

also passed to dialog2 via the variable CITY at line 34. 
The last major step in dialogl is nextcommand and can be 
referenced by dialog2 or dialog3. It prompts the user 
for a cityname or the word forecast. Similar to the 

15 weather step, nextcommand uses a loop to create the 

optionlist (line 53). If the user responds with a city 
name, the step getcurrentweather in dialog2 is called. 
If the user responds with the word forecast, step 
getf orecastweather is called instead. 

20 Dialog2 contains a single step getcurrentweather. 

The step first reads the city name into local variable 
strCity (line 95). A database query tries to find a 
match in the weather database for the city (lines 97 and 
98). If there is no weather information found for the 

25 city, the application will speak a message (line 101) 
and proceed to init step in dialogl (line 110). 
Otherwise, the application will speak the current 
weather information for the city (line 105) and switch 
to the nextcommand step in dialogl (line 112). 

30 Dialog3 is similar to dialog2 . It contains a 

single step getf orecastweather . The database query is 
identical to the one in dialog2 . If there is weather 
information available for the city, the application will 
speak the weather forecast (line 105), otherwise a 
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notification message is spoken (line 101). Dialog3 
relinquishes control back to dialogl with either the 
init step (line 110) or next command (line 112). 



5 <%@ LANGUAGE =" VBSCRIPT" %> 

<% 

Option Explicit 

Private ob jConnection, rsCities 
Private strCity, SQLQuery 
10 1 Create and open a connection to the database. 

Set ob jConnection = 
Server . CreateOb ject ( "ADODB . Connection" ) 

obj Connection . Open "Weather Database " 

%> 

15 <?XML VERSION="1.0"?> 

<! — 



— > 




<! — 


(c) 1998 Motorola Inc. 


— > 




<! — 


dialogl . asp 


— > 




<! — 





<DIALOG> 

<CLASS NAME="help_top"> 

<HELP>You are at the top level menu. For 
weather information , 
30 say weather. </HELP> 

</CLASS> 

<STEP NAME="init" PARENT="help_top"> 

<PROMPT>Welcome to Genie. <BREAK SIZE=" large " /> 
How may I help you? </PROMPT> 
35 <INPUT TYPE="OPTIONLIST"> 

<OPTION NEXT= "#weather " >weather< /OPTION> 
<OPTION NEXT="#market">market</OPTION> 
<OPTION NEXT= " #news ">news< /OPTION> 
<OPTION NEXT= " #bye ">exit< / OPTION> 
40 </INPUT> 
</STEP> 

<CLASS NAME="help_dialogl"> 

<HELP>Your choices are <OPTIONS/> . </HELP> 
</CLASS> 

45 <STEP NAME= "weather" PARENT="help_dialogl "> 

<PROMPT>What city? </PROMPT> 
<INPUT TYPE="optionlist" NAME= " CITY " > 
<% ' Get all city names. %> 

<% SQLQuery = "SELECT * FROM CityList" %> 
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<% Set rsCities = 
ob jConnection . Execute ( SQLQuery ) %> 

<% Do Until rsCities. EOF %> 

<% 1 Create an OPTION element for each 

5 city. %> 

<OPTION 

NEXT="dialog2 . asp#getcurrentweather " 

VALUE= M <%= rsCities ( "City" ) %>"> 
<%= rsCities ( "City" ) %></OPTION> 
10 <% rsCities. MoveNext %> 

<% Loop %> 

<OPTION NEXT= "#init " >exit</OPTION> 

</INPUT> 
</STEP> 

15 <STEP NAME="nextcommand" PARENT= " help_dialogl "> 

<% strCity = Request . QueryString ( "CITY" ) %> 
<PROMPT> To hear the 3 day forecast for 
<%=strCity%>, say 

forecast, or say another city 
20 name.</PROMPT> 

<INPUT TYPE="optionlist " NAME="CITY"> 
<% * Get all city names. %> 

<% SQLQuery = "SELECT * FROM CityList" %> 
<% Set rsCities = 
25 ob jConnection. Execute ( SQLQuery) %> 

<% Do Until rsCities. EOF %> 

<% 1 Create an OPTION element for each 

city. %> 

<OPTION 

30 NEXT= " dialog2 . asp#getcurrentweather " 

VALUE="<%= rsCities ( "City" ) %>"> 
<%= rsCities ( "City" ) %></OPTION> 
<% rsCities .MoveNext %> 
<% Loop %> 
35 <OPTION 

NEXT-"dialog3 . asp#getf orecastweather " 

VALUE="<%= strCity 

%>">forecast</OPTION> 

<OPTION NEXT= " #init " >exit< /OPTION> 

40 </INPUT> 
</STEP> 

<STEP NAME=" market "> 

<PROMPT>Market update is currently not 
supported. </PROMPT> 
45 <INPUT TYPE= "NONE " NEXT= "#init " /> 

</STEP> 

<STEP NAME= " news " > 

<PROMPT>News update is currently not 
supported . < / PROMPT> 
50 <INPUT TYPE= "NONE " NEXT= "#init " /> 

</STEP> 
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<STEP NAME =" bye " PARENT= M help__top"> 

<PROMPT>Thanks for using Genie. Goodbye. 

</PROMPT> 

< INPUT TYPE= " NONE " NEXT= "#exit " /> 
5 </STEP> 

</DIALOG> 

<! — End of 

Dialogl . asp — > 

<%@ LANGUAGE = ' 1 VB SCRIPT" %> 
10 <% 

Option Explicit 

Private ob jConnection, rsWeather, SQLQuery 
Private strCity, Valid 

' Create and open a connection to the database. 
15 Set ob jConnection = 

Server . CreateOb ject ( "ADODB. Connection" ) 

obj Connection . Open "Weather Database " 

%> 

<?XML VERS ION= " 1.0" ?> 
20 < I — 



— > 




<! — 


(c) 1998 Motorola Inc. 


— > 




<! — 


dialog2 . asp 


— > 




<! — 





30 <DIALOG> 

<CLASS NAME="help_dialog2"> 

<HELP>Your choices are <OPTIONS/> . </HELP> 
</CLASS> 

<STEP NAME="getcurrentweather"> 
35 <% strCity = Request . QueryString ( "CITY" ) %> 

<% Valid = "TRUE" %> 

<% SQLQuery = "SELECT * FROM WDB WHERE ( 
City= ' " & strCity & " ' 
) " %> 

40 <% Set rsWeather = 

ob jConnection. Execute ( SQLQuery) %> 

<% If rsWeather .EOF Then %> 
<% Valid « "FALSE" %> 

<PROMPT> Sorry, <BREAK/> There are no 
45 current weather 

conditions available for 
<%=strCity%>.<BREAK/x/PROMPT> 
<% Else %> 

<% ' Speak current weather information %> 
50 <PROMPT> <%=rsWeather( "Current" ) %> </PROMPT> 

<%End If %> 
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<INPUT TYPE = "Hidden" NAME="CITY" 
VALUE="<%=strCity%>" > 
</INPUT> 

<% If ( Valid = "FALSE" ) Then %> 
5 < INPUT TYPE="none" 

NEXT= " dialog 1 . asp#init " < / INPUT> 
<% Else %> 

< INPUT TYPE="none" 
NEXT= " dialog 1 . asp#nextcommand " >< / INPUT> 
10 <% End If %> 

</STEP> 
</DIALOG> 

<! — End of 

Dialog2 . asp — > 

15 <%@ LANGUAGE=" VBSCRIPT" %> 

<% 

Option Explicit 

Private obj Connection, rsWeather, SQLQuery 
Private strCity, Valid 
20 ' Create and open a connection to the database, 

Set ob jConnection = 
Server . CreateOb ject ( "ADODB . Connection" ) 

ob jConnection. Open "Weather Database" 

%> 

25 <?XML VERSION="1.0"?> 

<l — 

__ — > 

<! — (c) 1998 Motorola Inc. 

30 — > 

<! — dialog3.asp 

— > 

< ! — 



<DIALOG> 

<CLASS NAME="help_dialog3"> 

<HELP>Your choices are <OPTIONS/> .</HELP> 
</CLASS> 

40 <STEP NAME="getforecastweather"> 

<% strCity = Request . QueryString ( "CITY" ) %> 
<% Valid = "TRUE" %> 

<% SQLQuery = "SELECT * FROM WDB WHERE ( City= ' " & 
strCity & " ' ) " %> 
45 <% Set rsWeather = ob jConnection. Execute ( SQLQuery ) 

%> 

<% If rsWeather .EOF Then%> 
<% Valid = "FALSE" %> 

<PROMPT> Sorry, <BREAK/> There is no forecast 

50 weather 

available for <%=strCity%> . <BREAK/x/PROMPT> 
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<% Else %> 

<% ' Speak forecast weather information %> 
<PROMPT> <l=rsweather ( "Forecast" ) %> </PROMPT> 
<% End If %> 
5 <INPUT TYPE = "Hidden" NAME="CITY" 

VALUE="<%=strCity%>" > </INPUT> 

<% If ( Valid = "FALSE" ) Then%> 

<INPUT TYPE= " none " NEXT= " dialog 1 . asp#init "</INPUT> 
<% Else %> 
10 < INPUT TYPE="none" 

NEXT="dialogl . asp#nextcommand"></INPUT> 
<% End If %> 
</STEP> 
</DIALOG> 

15 <! — End of 

Dialog3 . asp — > 



Accordingly, there has been described herein 

20 methods and systems to allow users to access information 
from any location in the world via any suitable network 
access device. The user can access up-to-date 
information, such as, news updates, designated city 
weather, traffic conditions, stock quotes, and stock 

25 market indicators. The system also allows the user to 

perform various transactions (i.e., order flowers, place 
orders from restaurants, place buy or sell orders for 
stocks, obtain bank account balances, obtain telephone 
numbers, receive directions to destinations, etc.) 

30 It will be apparent to those skilled in the art 

that the disclosed embodiment may be modified in 
numerous ways and may assume many embodiments other than 
the preferred form specifically set out and described 
above. Accordingly, it is intended by the appended 

35 claims to cover all modifications of the invention which 
fall within the true spirit and scope of the invention. 
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What is claimed is: 

1 . A markup language stored on a computer- 
5 readable medium to provide interactive services 
comprising: 

a dialog element including a plurality of 
markup language elements, each of the plurality of 
markup language elements being identifiable by at 
10 least one markup tag; 

a step element contained within the dialog 
element to define a state within the dialog 
element, the step element including a prompt 
element and an input element; 
15 the prompt element including an announcement 

to be read to the user; and 

the input element including at least one input 
that corresponds to a user input. 

20 2. The markup language document of claim 1, 

wherein announcement comprises one of voice over 
internet protocol data and textual data. 

3. The markup language of claim 1, wherein 
25 the dialog element further contains a help element 

to identify a help request from the user. 

4. The markup language of claim 1, wherein 
the dialog element further contains an error 

30 element to identify an error. 
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5. The markup language document of claim 1, 
wherein the step element further contains one of a 
name attribute , a bargein attribute, a parent 
attribute, and a cost attribute. 

5 

6. The markup language of claim 1, wherein 
the dialog element further contains an audio 
element including audio data to be played to the 
user. 

10 

7 . A method of creating a voice application 
program to provide user interactive voice services, 
the method comprising the steps of: 

creating a markup language document 
15 having a plurality of elements; 

selecting a prompt element; 

defining a voice communication in the prompt 
element to be read to the user; 

selecting an input element; and 
20 defining an input variable to store data 

inputted by the user. 



8 . A program stored on a computer-readable 
medium to provide interactive voice services 
25 comprising the steps of: 

associating a voice communication to be read 
to a user to define a prompt element; and 

associating at least one option corresponding 
to a user input to define an input element. 



30 



WO 00/05643 



-83- 



PCT/US99/16777 



9 . A markup language document stored on a 
computer-readable medium to provide interactive 
voice services comprising: 

a dialog element including a help element, the 
5 dialog element being identified by at least one 
markup tag; and 

the help element including at least one prompt 
to be played to the user in response to a help 
request from the user. 

10 

10. A markup language document stored on a 
computer-readable medium to provide interactive 
voice services comprising: 

a dialog element including an error element, 
15 the dialog element being identified by at least one 
markup tag; and 

the error element including at least one 
prompt to be played to the user in response to an 
error. 



25 



WO 00/05643 



PCT/US99/16777 



1/9 



102 

, 1 . 

NETWORK 

ACCESS DEVICE 



108 

a.* 



104 

I 

ELECTRONIC 
NETWORK 



, 110 

H 



106 

/ 

INFORMATION 
SOURCE 



700 



FIG. 1 



( START ^) 



CALL INTO NETWORK OR SYSTEM h/50 



ANSWERING INCOMING CALL 



PROVIDE PROMPT TO CALLER 



WAIT FOR RESPONSE 



ESTABLISH A CONNECTION 
TO INFORMATION SOURCE 



I 



RETRIEVE INFORMATION 



152 



J- 756 



l- 



158 



160 



PROVIDE INFORMATION TO CALLER~t ~ 762 



1 

( START y-164 



FIG. 2 



WO 00/05643 PCT/US99/16777 

3/9 



TO LAN 



NETWORK FETCHER 



304 



300 



302- PARSER 



INTERPRETER 



H 



306 



TO LAN 



STATE MACHINE 



250 



FIG. 4 



WO 00/05643 



4/9 



PCT7US99/16777 



c 



DETERMINE INITIAL ADDRESS 
AND A STEP ELEMENT 



T 



> 



400 



402- 



FETCH 



CONTENTS OF 
ADDRESS 



CURRENT 



J-« — RE-FETCH — 



448 



PARSE CONTENTS AND BUILD 
LOCAL STEP TABLE 



i 



404 



406- 



408 



PLAY PROMPT (IF ANY) 
FOR CURRENT STEP 



y- 



PLAY CURRENT STEP- 



'446 



COLLECT INPUT (IF ANY) 
FROM CURRENT STEP 



REPEAT 



'422 




414 



DETERMINE AND PLAY 
APPROPRIATE ERROR RESPONSE 



DETERMINE AND PLAY 
APPROPRIATE HELP RESPONSE 




Y 



424 




422 



( REPEAT V" 



FIG. 5A 



( REPEAT y 



428 



WO 00/05643 



PCTMJS99/16777 



5/9 




YES 





r 434 




DETERMINE AND 


PLAY 




APPROPRIATE CANCEL 


RESPONSE 



DETERMINE APPROPRIATE 1 


NEXT 


STEP 1 




t 



■432 



^436 

NEXT STEP 
SPECIFIED IN CANCEL 
RESPONSE 

9 



YES 



( REPEAT } 

, i ^448 1 ' 

( RE-FETCH J 422 




(PLAY CURRENT STEPy 



1 

(re-fetch^ 


446 448 



(PLAY CURRENT STEP ^ 



FIG. SB 



WO 00/05643 



PCTAJS99/16777 



6/9 



-502 



PRE-EXISTING 
GRAMMAR 

? 

'yes 



NO 



SEND PRE-EXISTING GRAMMAR ^ 504 
TO VRU SERVER 



RECOGNIZE INPUT 

1 



y 



506 



1 



LOOK UP PRONUNCIATION 
IN DICTIONARY 



I 



USE PHONETIC RULES 
AND PRONUNCIATION TO 
GENERATE GRAMMARS 



SEND 



GRAMMARS 
UNIT 



TO VRU 



■510 



J-572 



FIG. 5C 



1 <? XML VERSI0N="1.0"?> 

2 <DIALOG> 

3 <STEP NAME="INIT"> 

4 <PROMPT>WHAT MEAL WOULD LIKE TO HEAR THE SPECIALS 

5 FOR?</PROMPT> 

6 <INPUT TYPE="OPTIONLIST"> 

7 <OPTION NEXT="#BKFST"> BREAKFAST </OPTION> 

8 <OPTION NEXT="{LUNCH"> LUNCH </OPTION> 

9 <OPTION NEXT="jfDINNER"> DINNER </OPTION> 

10 </INPUT> 

11 </STEP> 
12 

13 <STEP NAME="BKFST"> 

14 <PROMPT> OUR BREAKFAST SPECIAL IS GREEN EGGS AND HAM </PROMPT> 

15 </STEP> 
16 

17 <STEP NAME="LUNCH"> 

18 <PROMPT> OUR LUNCH SPECIAL IS A BACON, LETTUCE, AND TOMATO 

19 SANDWICH. </PROMPT> 

20 </STEP> 
21 

22 <STEP NAME="DINNER"> 

23 <PROMPT> OUR DINNER SPECIAL TODAY IS ROAST BEEF AND MASHED 

24 POTATOES. </PROMPT> 

25 </STEP> t? t r P 

26 </DIALOG> r 1 Ir. O 
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